Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark | TechCrunch
Briefly

Meta's use of an unreleased version of Llama 4 Maverick to achieve high scores on LM Arena caused significant backlash, leading to changes in scoring policies. The unmodified Llama 4 ranks poorly against established models like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet. The experimental model was tailored for conversational output to perform well in benchmarks, raising concerns about the reliability and ethical nature of benchmarking practices. Meta has since released an open-source version and encourages developers to explore its capabilities further.
The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.
Meta's experimental Maverick, Llama-4-Maverick-03-26-Experimental, was 'optimized for conversationality', the company explained in a chart published last Saturday.
Many of these models are months old, indicating the unmodified version of Maverick is not very competitive.
Meta experiments with 'all types of custom variants', indicating ongoing exploration of model customizations.
Read at TechCrunch
[
|
]