People are using Super Mario to benchmark AI now

from TechCrunch 4 months ago

A recent study by Hao AI Lab suggests that Super Mario Bros. is a more rigorous test for AI than PokÃ©mon. Researchers tested various AI models, including Anthropic's Claude 3.7, which excelled, while notable models like Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. The study utilized a customized emulator and GamingAgent framework, which required AIs to generate real-time strategies for gameplay. The results indicated that reasoning models tend to lag due to their deliberative nature, hindering their performance in fast-paced gaming environments, raising questions about the correlation between AI gaming abilities and broader technological progress.

The game forced each model to learn to plan complex maneuvers and develop gameplay strategies, showcasing the AIâs adaptability in dynamic environments.

Interestingly, the lab found that reasoning models performed worse than non-reasoning models despite being generally stronger on most benchmarks.

Read at TechCrunch

#ai-benchmarking #super-mario-bros #gaming-agent #reasoning-models #ai-research

Collection

[

...

]

People are using Super Mario to benchmark AI now | TechCrunchPeople are using Super Mario to benchmark AI now | TechCrunch Briefly

People are using Super Mario to benchmark AI now | TechCrunch
People are using Super Mario to benchmark AI now | TechCrunch
Briefly