Study accuses LM Arena of helping top AI labs game its benchmark

"Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others."

"Votes over time contribute to a model's score - and, consequently, its placement on the Chatbot Arena leaderboard. While many commercial actors participate in Chatbot Arena, LM Arena has long maintained that its benchmark is an impartial and fair one."

A recent study from AI labs including Cohere, Stanford, MIT, and Ai2 criticizes LM Arena, the makers of the Chatbot Arena benchmark. The paper alleges that the organization facilitated unfair advantages for select companiesâincluding Meta, OpenAI, Google, and Amazonâby allowing them private testing opportunities while withholding lower scores from public view. This practice, according to researchers, undermines the integrity of Chatbot Arena's leaderboard, which is supposed to be impartial. Specific instances point to Meta testing 27 model variants prior to a launch while only releasing the score of a top-ranking model.

#ai-ethics #benchmarking #ml-fairness #chatbot-arena #corporate-competition

Read at TechCrunch

Unable to calculate read time

Collection

[

...

]

Study accuses LM Arena of helping top AI labs game its benchmark | TechCrunchStudy accuses LM Arena of helping top AI labs game its benchmark | TechCrunch Briefly

Study accuses LM Arena of helping top AI labs game its benchmark | TechCrunch
Study accuses LM Arena of helping top AI labs game its benchmark | TechCrunch
Briefly