"Current benchmarks fail to adequately address the needs of state-of-the-art [models], particularly in evaluating user preferences. Thus, there is an urgent necessity for an open, live evaluation platform based on human preference that can more accurately mirror real-world usage."
"Chatbot Arena has become something of an industry obsession. Posts about updates to its model leaderboards garner hundreds of views and reshares... Millions of people have visited the organization's website in the last year alone."
Collection
[
|
...
]