My apologies for the much delayed reply as I have recently found myself with little extra time to post adequate responses. Your critiques are very interesting to ponder, so I thank you for posting them. I did want to respond to this one though.
I believe all of my counterarguments center around my current viewpoint that given the rapid rate of progress involved on the engineering side, it is no longer reasonable in deep learning theory to consider what is possible, and it is more interesting to try to outline hard limitations. This emposes a stark contrast between deep learning and classical statistics, as the boundaries in the latter are very clear and are not shared by the former.
I want to stress that at present, nearly every conjectured limitation of deep learning over the last several decades has fallen. This includes many back of the napkin, "clearly obvious" arguments, so I'm wary of them now. I think the skepticism all along has been fueled in response to hype cycles, so we must be careful not to make the same mistakes. There is far too much empirical evidence available to counter precise arguments against the claim that there is an underlying understanding within these models, so it seems we must resort to the imprecise to continue the debate.
Scaling, along one axis, suggests a high polynomial degree of additional compute (not exponential) is required for increasing improvements, this is true. But the progress over the last few years has occurred due to the discovery of new axes to scale on, which further reduces the error rate and improves performance. There are still many potential axes left untapped. What is significant about scaling to me is not how much additional compute is required, but the fact that the predicted bottom at the moment is very, very low, far lower than anything else we have ever seen, and that doesn't require any more data than we currently have. That should be cause for concern until we find a better lower bound.
> We all know that these new LLMs aren't dramatic improvements off their previous versions
No, I don't agree. This may be evident to many, but to some, the differences are stark. Our perceived metrics of performance are nonlinear and person-dependent, and these major differences can be imperceptible to most. The vast majority of attempts at providing more regular metrics or benchmarks that are not already saturated have shown that LLM development is not slowing down by any stretch. I'm not saying that LLMs will "go to the moon". But I don't have anything concrete to say they cannot either.
> We have the math to show that it can be impossible to distinguish two explanations through data processing alone.
Actually, this is a really great point, but I think this highlights the limitations of benchmarks and the requirements of capacity-based, compression-based, or other types of alternative data-independent metrics. With these in tow, it can be possible to distinguish two explanations. This could be a fruitful line of inquiry.
I believe all of my counterarguments center around my current viewpoint that given the rapid rate of progress involved on the engineering side, it is no longer reasonable in deep learning theory to consider what is possible, and it is more interesting to try to outline hard limitations. This emposes a stark contrast between deep learning and classical statistics, as the boundaries in the latter are very clear and are not shared by the former.
I want to stress that at present, nearly every conjectured limitation of deep learning over the last several decades has fallen. This includes many back of the napkin, "clearly obvious" arguments, so I'm wary of them now. I think the skepticism all along has been fueled in response to hype cycles, so we must be careful not to make the same mistakes. There is far too much empirical evidence available to counter precise arguments against the claim that there is an underlying understanding within these models, so it seems we must resort to the imprecise to continue the debate.
Scaling, along one axis, suggests a high polynomial degree of additional compute (not exponential) is required for increasing improvements, this is true. But the progress over the last few years has occurred due to the discovery of new axes to scale on, which further reduces the error rate and improves performance. There are still many potential axes left untapped. What is significant about scaling to me is not how much additional compute is required, but the fact that the predicted bottom at the moment is very, very low, far lower than anything else we have ever seen, and that doesn't require any more data than we currently have. That should be cause for concern until we find a better lower bound.
> We all know that these new LLMs aren't dramatic improvements off their previous versions
No, I don't agree. This may be evident to many, but to some, the differences are stark. Our perceived metrics of performance are nonlinear and person-dependent, and these major differences can be imperceptible to most. The vast majority of attempts at providing more regular metrics or benchmarks that are not already saturated have shown that LLM development is not slowing down by any stretch. I'm not saying that LLMs will "go to the moon". But I don't have anything concrete to say they cannot either.
> We have the math to show that it can be impossible to distinguish two explanations through data processing alone.
Actually, this is a really great point, but I think this highlights the limitations of benchmarks and the requirements of capacity-based, compression-based, or other types of alternative data-independent metrics. With these in tow, it can be possible to distinguish two explanations. This could be a fruitful line of inquiry.