DeepMind Release Gemini Robotics-ER 1.5 for Embodied Reasoning
Briefly

DeepMind Release Gemini Robotics-ER 1.5 for Embodied Reasoning
"Gemini Robotics-ER 1.5 is designed for tasks that require spatial reasoning, multi-step planning, and execution in physical environments. It can output precise 2D points grounded in object size, weight, and affordances, supporting commands such as point at any object you can pick up. Developers can adjust a thinking budget to balance response latency with reasoning accuracy."
"The model includes safeguards against unsafe or physically infeasible plans, with checks on payload limits and workspace constraints. While it does not directly control robot actuators, it can call external tools such as vision-language-action (VLA) models or user-defined functions to execute commands. Gemini Robotics is built as a dual-model system, combining this reasoning model with a VLA counterpart to allow robots of different configurations to share higher-level reasoning abilities. Partners like Apptronik and more than 60 testers are currently working with the system."
"Compared to other large models applied to robotics, such as the Nvidia VLA, Gemini Robotics-ER emphasizes controllable reasoning depth and safety mechanisms. While previous systems focused on direct perception-to-action mapping, Gemini introduces a separation between reasoning and execution, which could make it easier to adapt across different hardware platforms."
"This general purpose approach will be transformational for robotics. Obviously the big robotic companies would"
Gemini Robotics-ER 1.5 is an embodied reasoning model for robotic tasks requiring spatial reasoning, multi-step planning, and physical execution. The model outputs precise 2D points grounded in object size, weight, and affordances, enabling commands like 'point at any object you can pick up.' Developers can tune a thinking budget to trade off response latency and reasoning accuracy. Built-in safeguards check payload limits and workspace constraints to prevent unsafe or infeasible plans. The model does not directly control actuators but can invoke external tools such as vision-language-action models or user-defined functions to carry out commands. The system pairs the reasoning model with a VLA counterpart to share higher-level reasoning across different robot configurations. Partners including Apptronik and over 60 testers are evaluating the preview.
Read at InfoQ
Unable to calculate read time
[
|
]