RoboCat is Google’s robot that understands tasks from photos
Adobe Firefly AIA new demo of RoboCat, Google DeepMind’s latest AI program that can recognize and control robots to complete tasks from just a photo.
Google DeepMind is in the midst of trying to create a new smoking gun in the generative AI scene. However, the Google subsidiary is also working on multiple projects, with its latest to show-off being RoboCat.
The software is an AI agent that is designed to control robotic appendages, with tasks set for it from photos. This is all possible from the extensive training RoboCat has received, which can begin to recognize tasks set for it by researchers.
According to RoboCat’s researchers, it “learns much faster than other state-of-the-art models”. The biggest feat is that it can pick up a task from less than 100 demonstrations and pulls from a “large and diverse data set.”
Examples include basic sorting of some fruit, as well as putting discs in the right places. While it sounds rudimentary right now, the demo points out that RoboCat is able to pick up the task after a few seconds from just a low-resolution camera shot.
However, originally, RoboCat couldn’t reach more than a 36% success rate if it wasn’t pre-taught a task. Through trial and error, Google DeepMind was able to have it self-train to double the score.
Google’s RoboCat works through obstacles
RoboCat is also able to continue with the tasks while the image is obstructed by hands as well. The idea here is that even in the event something goes wrong, it should be able to work around the incident without any hindrance.
Google DeepMind has also used two different robotic arms. Panda 7-DoF and Sawyer 5-DoF are functionally the same but have different levels of reach and freedom.
DoF, or Degrees of Freedom, indicates how the level of freedom it has before hitting its limit. This is also used in VR as well for similar purposes.
RoboCat is able to figure out and work backward from having more freedom with the Panda arm and then being forced to use the Sawyer one. While the goals are similar, the task can be much harder with two different degrees stripped out of the hardware.