I'm saying that the kind of changes you propose aren't made by anyone, and might generally not be worth making. Because "better RLVR" is an easier and better pathway to actual cross-domain performance gains.
If you could stabilize the kind of mess you want to make, you could put that effort into better RL objectives and get more return.
The mainstream LLM crowd aren't making these sorts of major changes yet, although some like DeepMind (the OG pushers of RL for AGI!) do acknowledge that a few more "transformer level" breakthoughs are necessary to reach what THEY are calling AGI, and others like LeCun are calling for more animal-like architectures.
Anyways, regardless of who is currently trying to move beyond LLMs or not, it should be pretty obvious what the problems are with trying to apply RL more generally, and what that would result in if successful, if that were the only change you made.
LLMs still have room to get better, but they will forever be LLMs, not brains, unless someone puts in the work to make that happen.
You started this thread talking about "AGI", without defining what you meant by that, and are now instead talking about "cross-domain performance gains". This is exactly why it makes no sense to talk about AGI without defining what you mean by it, since I think we talking about completely different things.
The claim I make is that LLMs can be AGI complete with pretty much zero architectural work. And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are - the issue isn't "the architecture is wrong", it's "we don't know how to train for this kind of thing".
"Fundamental limitations" aren't actually fundamental. If you want more learning than what "in-context" gives you? Teach the usual "CLI agent" LLM to make its own LoRAs and there goes that. So far, this isn't a bottleneck so pressing you'd want to resolve it by force.
LeCun is laughing stock nowadays, he didn't get kicked out of Meta for no reason.
You keep using the term "AGI" without defining what you mean by it, other than implicity defining it as "whatever can be achieved without changing the Transformer architecture", which makes your "claim" just a definitional tautology, which is fine, but it does mean you are talking about something different than what I am talking about, which is also fine.
> And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are
I've no idea what projects you are referring to.
It would certainly be bizarre if the Transformer architecture, never designed to be a brain, turns out to be the best brain we can come up with, and equal to real brains which have many more moving parts, each evolved over millions of years to fill a need and improve capability.
Maybe you are smarter than Demis Hassabis, and the DeepMind team, and all their work towards AGI (their version, not yours) will be a waste of effort. Why not send him a note "hey, dumbass, Transfomers are all you need!" ?
It would be certainly be bizarre if the 8086 architecture, never designed to be a foundation of all home, office and server computation, was the best CPU architecture ever made.
And it isn't. It's merely good enough.
That's what LLMs are. A "good enough" AI architecture.
By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that. They have limitations, but not the kind that can't be worked around with things like sharply applied tool use. Which LLMs can be trained for, and are.
So far, all the weirdo architectures that try to replace transformers, or put brain-inspired features into transformers, have failed to live up to the promise. Which sure hints that the bottleneck isn't architectural at all.
I'm not aware of any architectures that have tried to put "brain-inspired" features into Transformers, or much attempt to modify them at all for that matter.
The architectural Transformer tweaks that we've seen are:
- Various versions of attention for greater efficiency
- MOE vs dense for greater efficiency
- Mamba (SSM) + transformer hybrid for greater efficiency
None of these are even trying to fundamentally change what the Transformer is doing.
Yeah, the x86 architecture is certainly a bit of a mess, but as you say good enough, as long as what you want to do is run good old fashioned symbolic computer programs. However, if you want to run these new-fangled neural nets, then you'd be better off with a GPU or TPU.
> By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that.
I think DeepMind are right here, and you're wrong, but let's wait another year or two and see, eh?
If you could stabilize the kind of mess you want to make, you could put that effort into better RL objectives and get more return.