
"Imagine that you want a robot to sort a pile of laundry into whites and colors. Gemini Robotics-ER 1.5 would process the request along with images of the physical environment (a pile of clothing). This AI can also call tools like Google search to gather more data. The ER model then generates natural language instructions, specific steps that the robot should follow to complete the given task."
"Gemini Robotics 1.5 (the action model) takes these instructions from the ER model and generates robot actions while using visual input to guide its movements. But it also goes through its own thinking process to consider how to approach each step. "There are all these kinds of intuitive thoughts that help [a person] guide this task, but robots don't have this intuition," said DeepMind's Kanishka Rao."
"The DeepMind team tests Gemini robotics with a few different machines, like the two-armed Aloha 2 and the humanoid Apollo. In the past, AI researchers had to create customized models for each robot, but that's no longer necessary. DeepMind says that Gemini Robotics 1.5 can learn across different embodiments, transferring skills learned from Aloha 2's grippers to the more intricate hands on Apollo with no specialized tuning."
Gemini Robotics-ER 1.5 processes task requests together with images of the environment and can call external tools like Google Search to gather additional data. The ER model generates natural-language, step-by-step instructions for robots to execute tasks such as sorting laundry. Gemini Robotics 1.5, the action model, converts those instructions into visual-guided robot actions and performs an internal deliberation process to plan approaches before acting. Both robotic AIs are built on Gemini foundation models and are fine-tuned with physical-operation data, enabling more complex multi-stage tasks and cross-embodiment skill transfer between platforms like Aloha 2 and Apollo. The ER model is being released to developers via Google AI Studio while the action model remains restricted.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]