RoboCat: Google DeepMind's innovative leap into AI-powered robotics

The self-improving AI agent RoboCat, demonstrates proficiency in operating various robotic arms, learning from minimal demonstrations, and enhancing its capabilities through self-generated data.

Robotic interactions (photo credit: Thinkstock/Imagebank)
Robotic interactions
(photo credit: Thinkstock/Imagebank)

In a significant advancement in robotics, Google DeepMind has introduced a new AI agent named RoboCat. This agent is designed to learn a variety of tasks across different robotic arms, showcasing the ability to self-generate new training data to improve its techniques, marking a crucial step towards the creation of general-purpose robots.

RoboCat, a Transformer model with a VQ-GAN encoder, was released in June 2023. It is primarily intended for research into learning to accomplish a wide variety of tasks from expert demonstrations or multiple real robot embodiments for manipulation.

The primary intended users are Google DeepMind researchers, and it's not intended for commercial or production use.

RoboCat's learning speed and self-improvement process

RoboCat's standout feature is its learning speed. It can master a new task with as few as 100 demonstrations, leveraging a large and diverse dataset. This capability reduces the need for human-supervised training, potentially accelerating the pace of robotics research.

RoboCat's training involves a comprehensive five-step self-improvement process. It starts with collecting 100-1000 demonstrations of a new task or robot, using a robotic arm controlled by a human. This new task or arm data is used to fine-tune RoboCat, creating a specialized spin-off agent. This agent practices the new task or arm an average of 10,000 times, generating more training data.

 Artificial Intelligence illustrative. (credit: Wikimedia Commons)
Artificial Intelligence illustrative. (credit: Wikimedia Commons)

The demonstration data and self-generated data are then incorporated into RoboCat’s existing training dataset, and a new version of RoboCat is trained on the updated dataset.

Diverse training data and tasks for RoboCat

This process enables RoboCat to learn from a wide range of tasks and diverse training data types. Having been trained on millions of trajectories from both real and simulated robotic arms, RoboCat handles a variety of tasks involving different objects and variations, sourced from Reinforcement Learning (RL), Teleoperation (Teleop), and RoboCat itself.

These tasks include stacking RGB objects, tower and pyramid building with RGB objects, and lifting NIST-i gears, among others. The training involved four different types of robots and many robotic arms to collect vision-based data representing the tasks RoboCat would be trained to perform.

RoboCat demonstrates impressive adaptability by quickly learning to operate different robotic arms. For example, after observing 1000 demonstrations controlled by humans, RoboCat could successfully direct a new arm with a three-fingered gripper and twice as many controllable inputs, achieving an 86% success rate in picking up gears.

RoboCat's progress in learning new tasks 

Moreover, the more new tasks RoboCat learns, the better it gets at learning additional new tasks. The initial version of RoboCat achieved a 36% success rate on previously unseen tasks after learning from 500 demonstrations per task. However, the latest version, trained on a more diverse set of tasks, more than doubled this success rate on the same tasks.

RoboCat's performance was evaluated through various tasks, such as inserting and removing objects from a bowl and lifting large gears. These evaluations were conducted in both simulated and real-world environments and compared to the performance of human teleoperators.

During the training process, RoboCat uses different observations to understand the robot's position and grip. These observations include joint angles, TCP position, gripper joint angle, and gripper grasp status. The specific observations depend on the robot and objects being used.

In the development of RoboCat, an interesting comparison was made between the VQ-GAN tokenizer and the patch ResNet used in Gato. The patch ResNet tokenizer performed better during training tasks but performed worse on tasks that were not included during training.

It's important to note that RoboCat is currently an early research model and has not been evaluated for deployment and safety outside of research environments. As RoboCat's capabilities expand, potential ethical and safety risks need to be carefully addressed. Therefore, caution should be exercised when considering the use of RoboCat outside of research settings. Nonetheless, the development of RoboCat represents a significant milestone in the field of robotics and AI, bringing us closer to a future where robots are an integral part of our everyday lives.