You have to think of this as an entire system. The arm is necessary but not sufficient. An "arm" could be as simple as small servos and popsicle sticks [0]. In the case of ALOHA, below is an outline of the basic components.
* arms (aka follower arms)
- effector (i.e. gripper)
- sensors (i.e. cameras, depth sensors, specced Intel RealSense D405)
- gravity compensation (so the relatively delicate servos aren't overloaded)
* controller
- runs Robot Operating System (ROS [1]) plus other software (i.e. arm, gripper interfaces [2])
- runs ALOHA model in inference to tell ROS what to do based on task and sensor input
- trains ALOHA models using arm motion encoder and ACT: Action Chunking with Transformers [4]
* leader arms
- motion encoders (essentialy an arm in reverse that can be used by a human to telecontrol the arm to encode motions into model training)
The system at this point is "research grade" which is at once expensive due to custom/nice materials/units and not super user friendly--you must know a lot. See the build instructions [5].
Paper: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation: https://arxiv.org/abs/2401.02117
Video set: https://mobile-aloha.github.io/
Tutorial: https://docs.google.com/document/d/1_3yhWjodSNNYlpxkRCPIlvIA...
Kits for sale: https://www.trossenrobotics.com/aloha-kits