Akshually, while training models requires (at the moment) massive parallelization and consequently stacks of A100s, inference can be distributed pretty well (see petals for example). A pirate ‘ChatGPT’ network of people sharing consumer graphics cards could probably indeed work if the data was sourced. It bears thinking about. It really does.