"True AGI" is often used in a way that means "human-like intelligence, but faster, more consistent and of greater depth". In that case, knowing that embodied agents are the way forward is quite trivial. We've known for a long time that the development of a human brain is a function of its sensory inputs - why would this be any different for an artificial intelligence, especially if designed to mimic/interface with a human one?
That's not the right question to ask. You can construct all sorts of hypotheticals or alternative answers but all of that is meaningless until someone actually builds it.