That depends on how you define AGI - it's a meaningless term to use since everyo...

ACCount37 · 2026-01-02T11:23:55 1767353035

> which to be useful would require the "LLM" to then be autonomous and act in some (real/virtual) world in order to learn.

You described modern RLVR for tasks like coding. Plug an LLM into a virtual env with a task. Drill it based on task completion. Force it to get better at problem-solving.

It's still an autoregressive next token prediction engine. 100% LLM, zero architectural changes. We just moved it past pure imitation learning and towards something else.

HarHarVeryFunny · 2026-01-02T13:34:01 1767360841

Yes, if all you did was replace current pre/mid/post training with a new (elusive holy grail) runtime continual learning algorithm, then it would definitely still just be a language model. You seem to be talking about it having TWO runtime continual learning algorithms, next-token and long-horizon RL, but of course RL is part of what we're calling an LLM.

It's not obvious if you just did this without changing the learning objective from self-prediction (auto-regressive) to external prediction whether you'd actually gain much capability though. Auto-regressive training is what makes LLMs imitators - always trying to do same as before.

In fact, if you did just let a continual learner autonomously loose in some virtual environment, why would you expect it do do anything different, other than continual learning from whatever it was exposed to in the environment, from putting a current LLM in a loop, together with tool use as a way to expose it to new data? An imitative (auto-regressive) LLM doesn't have any drive to do anything new - if you just keep feeding it's own output back in as an input, then it's basically just a dynamical system that will eventually settle down into some attractor states representing the closure of the patterns it has learnt and is generating.

If you want the model to behave in a more human/animal-like self-motivated agentic fashion, then I think the focus has to be on learning how to act to control and take advantage of the semi-predictable environment, which is going to be based on having predicting the environment as the learning objective (vs auto-regressive), plus some innate drives (curiosity, boredom, etc) to bias behavior to maximize learning and creative discovery.

Continual learning also isn't going to magically solve the RL reward problem (how do you define and measure RL rewards in the general, non-math/programming, case?). In fact post-training is a very human-curated affair since humans have identified math and programming as tasks where this works and have created these problem-specific rewards. If you wanted the model to discover it's own rewards at runtime, as part of your new runtime RL algorithm perhaps, then you'd have to figure how to bake that into the architecture.

ACCount37 · 2026-01-02T18:07:57 1767377277

No. There are no architectural changes and no "second runtime learning algorithm". There's just the good old in-context learning that all LLMs get from pre-training. RLVR is a training stage that pressures the LLM to take advantage of it on real tasks.

"Runtime continual learning algorithm" is an elusive target of questionable desirability - given that we already have in-context learning, and "get better at SFT and RLVR lmao" is relatively simple to pull off and gives kickass gains in the here and now.

I see no reason why "behave in a more human/animal-like self-motivated agentic fashion" can't be obtained from more RLVR, if that's what you want to train your LLMs for.

HarHarVeryFunny · 2026-01-03T00:37:09 1767400629

I'm not sure what you are saying. There are LLMs as exist today, and there are any number of changes one could propose to make to them.

The less you change, the more they stay the same. If you just add "more" RLVR (perhaps for a new domain - maybe chemistry vs math or programming?), then all you will get is an LLM that is better at acing chemistry reasoning benchmarks.

ACCount37 · 2026-01-03T07:46:54 1767426414

I'm saying that the kind of changes you propose aren't made by anyone, and might generally not be worth making. Because "better RLVR" is an easier and better pathway to actual cross-domain performance gains.

If you could stabilize the kind of mess you want to make, you could put that effort into better RL objectives and get more return.

HarHarVeryFunny · 2026-01-03T14:39:19 1767451159

The mainstream LLM crowd aren't making these sorts of major changes yet, although some like DeepMind (the OG pushers of RL for AGI!) do acknowledge that a few more "transformer level" breakthoughs are necessary to reach what THEY are calling AGI, and others like LeCun are calling for more animal-like architectures.

Anyways, regardless of who is currently trying to move beyond LLMs or not, it should be pretty obvious what the problems are with trying to apply RL more generally, and what that would result in if successful, if that were the only change you made.

LLMs still have room to get better, but they will forever be LLMs, not brains, unless someone puts in the work to make that happen.

You started this thread talking about "AGI", without defining what you meant by that, and are now instead talking about "cross-domain performance gains". This is exactly why it makes no sense to talk about AGI without defining what you mean by it, since I think we talking about completely different things.

ACCount37 · 2026-01-03T15:43:34 1767455014

The claim I make is that LLMs can be AGI complete with pretty much zero architectural work. And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are - the issue isn't "the architecture is wrong", it's "we don't know how to train for this kind of thing".

"Fundamental limitations" aren't actually fundamental. If you want more learning than what "in-context" gives you? Teach the usual "CLI agent" LLM to make its own LoRAs and there goes that. So far, this isn't a bottleneck so pressing you'd want to resolve it by force.

LeCun is laughing stock nowadays, he didn't get kicked out of Meta for no reason.

HarHarVeryFunny · 2026-01-03T18:22:05 1767464525

You keep using the term "AGI" without defining what you mean by it, other than implicity defining it as "whatever can be achieved without changing the Transformer architecture", which makes your "claim" just a definitional tautology, which is fine, but it does mean you are talking about something different than what I am talking about, which is also fine.

> And none of the "brain-like" architectures are actually much better at "being a brain" than LLMs are

I've no idea what projects you are referring to.

It would certainly be bizarre if the Transformer architecture, never designed to be a brain, turns out to be the best brain we can come up with, and equal to real brains which have many more moving parts, each evolved over millions of years to fill a need and improve capability.

Maybe you are smarter than Demis Hassabis, and the DeepMind team, and all their work towards AGI (their version, not yours) will be a waste of effort. Why not send him a note "hey, dumbass, Transfomers are all you need!" ?

ACCount37 · 2026-01-04T06:38:30 1767508710

It would be certainly be bizarre if the 8086 architecture, never designed to be a foundation of all home, office and server computation, was the best CPU architecture ever made.

And it isn't. It's merely good enough.

That's what LLMs are. A "good enough" AI architecture.

By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that. They have limitations, but not the kind that can't be worked around with things like sharply applied tool use. Which LLMs can be trained for, and are.

So far, all the weirdo architectures that try to replace transformers, or put brain-inspired features into transformers, have failed to live up to the promise. Which sure hints that the bottleneck isn't architectural at all.

HarHarVeryFunny · 2026-01-04T17:14:04 1767546844

I'm not aware of any architectures that have tried to put "brain-inspired" features into Transformers, or much attempt to modify them at all for that matter.

The architectural Transformer tweaks that we've seen are:

- Various versions of attention for greater efficiency

- MOE vs dense for greater efficiency

- Mamba (SSM) + transformer hybrid for greater efficiency

None of these are even trying to fundamentally change what the Transformer is doing.

Yeah, the x86 architecture is certainly a bit of a mess, but as you say good enough, as long as what you want to do is run good old fashioned symbolic computer programs. However, if you want to run these new-fangled neural nets, then you'd be better off with a GPU or TPU.

> By "AGI", I mean the good old "human equivalence" proxy. An AI that can accomplish any intellectual task that can be accomplished by a human. LLMs are probably sufficient for that.

I think DeepMind are right here, and you're wrong, but let's wait another year or two and see, eh?