Here is my take. When people use the stochastic parrots phrase, very often they ...

Here is my take. When people use the stochastic parrots phrase, very often they use it as an explanation of what is happening. But in many cases, I don't think they appreciate: (1) good explanations must be testable models; (2) different explanations exist at different levels of abstraction; (3) having one useful level of explanation does not mean that other levels of explanation are not accurate nor useful.

Sure, optimization based on predicting the next word is indeed the base optimizer for LLMs. This doesn't prevent the resulting behavior from demonstrating behavior that corresponds with some measurable levels of intelligence, as in problem-solving in particular domains! Nor does it prevent fine-tuning from modifying the LLMs behavior considerably.

One might say e.g. "LLMs only learn to predict the next word." The word only is misleading. Yes, models learn to predict the next word, and they build a lot of internal structures to help them do that. These structures enable capabilities much greater than merely parroting text. This is a narrow claim, but it is enough to do serious damage to the causal wielder of the "stochastic parrots" phrase. (To be clear, I'm not making any claims about consciousness or human-anchored notions of intelligence.)