After LLMs and agents, the next AI frontier: video language models

"Tesla's viral videos show its Optimus humanoid robot serving drinks to guests - a glimpse of AI in the real world that a new AI innovation, called world models, is expected to make more reliable. (For one, humanoid robots will do a better job in navigating and serving people their custom drinks.) World models - which some refer to as video language models - are the new frontier in AI, following in the footsteps of the iconic ChatGPT and more recently, AI agents."

"World models are designed to help robots understand the physical world around them, allowing them to track, identify and memorize objects. On top of that, just like humans planning their future, world models allow robots to determine what comes next - and plan their actions accordingly. "If you think about how generative AI started..., the difference with world models is that it needs to know what is actually possible," said TJ Galda, Nvidia's senior director of product management for Cosmos, a world model."

World models are AI systems that simulate and represent the physical environment, enabling machines to track, identify and memorize objects and to predict future states. They allow robots to plan actions by assessing what is physically possible and anticipating next steps, improving tasks such as navigation and serving custom drinks. World models extend generative AI from digital outputs to real-world outcomes, supporting safety features for autonomous vehicles and virtual training of factory floors. Human experiences and real-world observations can be incorporated into world models to enable more effective human-AI collaboration. Industry estimates project widespread humanoid robot deployment by 2050, increasing demand for reliable world models.

#world-models #robotics #generative-ai #simulation

Read at Computerworld

Unable to calculate read time

Collection

[

...

]

After LLMs and agents, the next AI frontier: video language modelsAfter LLMs and agents, the next AI frontier: video language models Briefly

After LLMs and agents, the next AI frontier: video language models
After LLMs and agents, the next AI frontier: video language models
Briefly