"EMMA is research that demonstrates the power and relevance of multimodal models for autonomous driving," said Drago Anguelov, VP and Head of Research at Waymo. "We are excited to continue exploring how multimodal methods and components can contribute towards building an even more generalizable and adaptable driving stack."
Waymo emphasizes that the EMMA model uses real-world knowledge based on its Gemini language model, enabling cars to operate directly from sensory data in real-time driving situations.
With the end-to-end learning approach, EMMA processes raw camera inputs and textual data to generate various driving outputs, including planner trajectories and perception objects.
The use of a unified language space in EMMA aims to maximize Gemini's knowledge representation, illustrating how non-sensor inputs and outputs can be represented as natural language text.
Collection
[
|
...
]