Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want a robot that can fold your laundry, clean your room and cook, you need a lot more than cheap hardware. You need an autonomous agent (i.e. "an AI") that can guide the hardware to accomplish the task.

We're still very far from that and you certainly can't do that with ALOHA, in practice, despite what the videos may seem to show. For each of the few, discrete, tasks that you see in the videos, the robot arms have to be trained by demonstration (via teleoperation) and the end result is a system that can only copy the operator's actions with very little variation.

You can check this in the Mobile ALOHA paper on arxiv (https://arxiv.org/abs/2401.02117) where page 6 shows the six tasks the system has been trained to perform, and the tolerances in the initial setup. So e.g. in the shrimp cooking task, the initial position of the robot can vary by 10cm and the position of the implements by 2cm. If everything is not set up just so, the task will fail.

What all this means is that if you could assemble this "cheap" system you'd then have to train it by a few hundred demonstrations to fold your laundry, and maybe it could do it, probably not, and if you moved the washing machine or got a new one, you'd have to train all over again.

As to robots cleaning up your room and cooking, those are currently in the realm of science fiction, unless you're a zen ascetic living in an empty room and happy to eat beans on toast every day. Beans from a can, that is. You'll have to initialise the task by opening the can yourself, obviously. You have a toaster, right?



> If you want a robot that can fold your laundry, clean your room and cook, you need a lot more than cheap hardware. You need an autonomous agent (i.e. "an AI") that can guide the hardware to accomplish the task.

Yes, that's my point. Cheap hardware is far harder to control than expensive hardware, so if Google actually developed some AI that can do high-precision tasks on "wobbly", off-the-shelf hardware, that would be the breakthrough.

I agree that extensive training for each single device would be prohibitive, but that feels like a problem that could be solved with more development: With many machine learning tasks, we started with individual training a model for each specific use case and environment. Today we're able to make generalized model which are trained once and can be deployed in a wide variety of environments. I don't see why this shouldn't be possible for a vision-based robot controller either.

Managing the actual high-level task is easy as soon as you're able to do all the low-level tasks: I.e., converting a recipe into a machine-readable format, dividing it into a tree of tasks and subtasks etc is easy. The hard parts are actually cutting the vegetables, de-boning the meat, etc. The amount of complex movement planning necessary for that doesn't exist yet. But this project looks as if it's a step in exactly that direction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: