Study accuses LM Arena of helping top AI labs game its benchmark | TechCrunch
Briefly

A recent study from AI labs including Cohere, Stanford, MIT, and Ai2 criticizes LM Arena, the makers of the Chatbot Arena benchmark. The paper alleges that the organization facilitated unfair advantages for select companies—including Meta, OpenAI, Google, and Amazon—by allowing them private testing opportunities while withholding lower scores from public view. This practice, according to researchers, undermines the integrity of Chatbot Arena's leaderboard, which is supposed to be impartial. Specific instances point to Meta testing 27 model variants prior to a launch while only releasing the score of a top-ranking model.
Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others.
Votes over time contribute to a model's score - and, consequently, its placement on the Chatbot Arena leaderboard. While many commercial actors participate in Chatbot Arena, LM Arena has long maintained that its benchmark is an impartial and fair one.
Read at TechCrunch
[
|
]