This AI Model Can Intuit How the Physical World Works
Briefly

This AI Model Can Intuit How the Physical World Works
"Here's a test for infants: Show them a glass of water on a desk. Hide it behind a wooden board. Now move the board toward the glass. If the board keeps going past the glass, as if it weren't there, are they surprised? Many 6-month-olds are, and by a year, almost all children have an intuitive notion of an object's permanence, learned through observation. Now some artificial intelligence models do too."
"Researchers have developed an AI system that learns about the world via videos and demonstrates a notion of "surprise" when presented with information that goes against the knowledge it has gleaned. The model, created by Meta and called Video Joint Embedding Predictive Architecture (V-JEPA), does not make any assumptions about the physics of the world contained in the videos. Nonetheless, it can begin to make sense of how the world works."
"As the engineers who build self-driving cars know, it can be hard to get an AI system to reliably make sense of what it sees. Most systems designed to "understand" videos in order to either classify their content ("a person playing tennis," for example) or identify the contours of an object-say, a car up ahead-work in what's called "pixel space." The model essentially treats every pixel in a video as equal in importance."
V-JEPA is an AI system that learns about physical properties and expectations by analyzing ordinary videos. The system forms internal predictive models and signals "surprise" when observations contradict its learned expectations. V-JEPA makes no explicit physics assumptions yet begins to infer object permanence and physical interactions from video data. Pixel-space video models treat every pixel equally and can be distracted by irrelevant details such as moving leaves, limiting abstraction. V-JEPA seeks higher-level abstractions that focus on relevant objects and dynamics rather than raw pixel patterns, enabling more robust scene understanding and anticipation.
Read at WIRED
Unable to calculate read time
[
|
]