
"To answer that question, our resident RPG-enthusiast Ram Iyer put together a set of five general questions about Baldur's Gate, which we ran against xAI and the three major models in a kind of quasi-benchmark that I've decided to call BaldurBench. In the interest of journalistic transparency, I've made all the chat transcripts public, so you can see them here: Grok, ChatGPT, Claude, and Gemini."
"Of course, you can imagine the frustration of any respected and experienced engineer who shows up to work thinking he'll be tackling fundamental problems of knowledge and machine intelligence, only to be sidetracked into helping a 54-year-old man beat his video game. But the anecdote raises an even more pressing question: Did Musk end up getting the gaming skills he wanted?"
Different AI labs prioritize different user groups, with xAI placing particular emphasis on video-game walkthroughs. A model release at xAI was delayed because leadership was dissatisfied with how the chatbot handled detailed Baldur's Gate questions. Senior engineers were reassigned from other projects to improve those responses. A targeted benchmark called BaldurBench evaluated xAI and three major models with five general Baldur's Gate questions, and full chat transcripts were published. Grok performed well, offering useful, well-informed answers rich in gamer jargon, theorycraft, and tabular presentation, though some familiarity with gaming terms improved utility.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]