It's the Chinese Room argument all over again. People hear "predicting the next token" and all they can imagine is some sort of a statistical database lookup, ignoring the fact that when you have a huge corpus of data with incredibly complex internal correlations and all that data also happens to correlate with some unknown external thing, it's almost certain that a powerful learner will end up modeling that external thing if and when doing so will cause a quantum leap in prediction performance! A model that includes the external-thing hypothesis will almost certainly be simpler, ceteris paribus, than a model that doesn't.