Google's new robot AI can fold delicate origami, close zipper bags without damage
Briefly

Google DeepMind has unveiled two new AI models, Gemini Robotics and Gemini Robotics-ER, aimed at significantly improving how robots interact with and navigate the physical world. These models build on the Gemini 2.0 architecture by incorporating 'vision-language-action' capabilities for Gemini Robotics, allowing for precise visual recognition and task execution via language commands. Meanwhile, Gemini Robotics-ER emphasizes spatial understanding to enhance robot control in existing systems. Together, these advancements represent a leap towards the elusive goal of embodied AI, which seeks to enable robots to perform general tasks reliably in various environments.
Google DeepMind's new AI models, Gemini Robotics and Gemini Robotics-ER, enhance the ability of robots to navigate and interact with their environments safely and effectively.
The Gemini Robotics model's 'vision-language-action' abilities allow robots to process visual information and understand language commands, greatly improving task execution.
With capabilities for embodied reasoning and enhanced spatial understanding, Gemini Robotics-ER allows roboticists to integrate it into existing control systems for better automation.
Google's advancements signify a key step towards achieving embodied AI, a goal of integrating AI into robots to perform general tasks in the physical world.
Read at Ars Technica
[
|
]