A recent study by Hao AI Lab suggests that Super Mario Bros. is a more rigorous test for AI than Pokémon. Researchers tested various AI models, including Anthropic's Claude 3.7, which excelled, while notable models like Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. The study utilized a customized emulator and GamingAgent framework, which required AIs to generate real-time strategies for gameplay. The results indicated that reasoning models tend to lag due to their deliberative nature, hindering their performance in fast-paced gaming environments, raising questions about the correlation between AI gaming abilities and broader technological progress.
The game forced each model to learn to plan complex maneuvers and develop gameplay strategies, showcasing the AI’s adaptability in dynamic environments.
Interestingly, the lab found that reasoning models performed worse than non-reasoning models despite being generally stronger on most benchmarks.
Collection
[
|
...
]