Google DeepMind Announces Robotics Foundation Model Gemini Robotics On-Device
Briefly

Gemini Robotics On-Device is a vision-language-action foundation model that operates locally on robot hardware, enabling low-latency inference. It requires minimal demonstrations for fine-tuning. This model allows robots to follow natural language instructions and reason about their surroundings using vision. It was trained on dual-armed Aloha robots and successfully adapted to multiple robotic platforms. DeepMind aims to enhance the accessibility of robotics models for developers, accelerating innovation in robotics through a specialized SDK. Additionally, benchmarks for safety mechanisms and visual reasoning capabilities were released alongside the model.
Gemini Robotics On-Device is designed for low-latency tasks on robot hardware, utilizing vision to follow instructions and reason about its environment.
The model can be fine-tuned with as few as 50 demonstrations, making it accessible for specific applications and adaptable to various hardware platforms.
Read at InfoQ
[
|
]