"I thought this sounded super fun and potentially interesting from a model capabilities point of view," Calcraft told TechCrunch in an interview. "So I sat indoors on a cloudy Saturday and got it done."
"The idea is to have a benchmark that's un-gameable," Calcraft said. "A benchmark that can't be beaten by memorizing specific answers or simple patterns that have been seen before during training."
"I believe Minecraft tests the models on resourcefulness and gives them more agency," he told TechCrunch. "It's not nearly as restricted and saturated as [other] benchmarks."
The idea of using games to benchmark AI dates back decades: Mathematician Claude Shannon argued in 1949 that games like chess were a worthy challenge for 'intelligent' software.
Collection
[
|
...
]