#elo-style-rankings

[ follow ]
Ars Technica
5 months ago
Data science

Turing test on steroids: Chatbot Arena crowdsources ratings for 45 AI models

The Large Model Systems Organization (LMSys) has created Chatbot Arena, a platform for comparing large language models (LLMs) based on blind pairwise ratings.
Users can enter prompts and compare side-by-side responses from two randomly selected models. [ more ]
[ Load more ]