A Meta executive has refuted rumors suggesting the company trained its AI models, Llama 4 Maverick and Llama 4 Scout, on test sets to artificially enhance benchmark performance. Ahmad Al-Dahle, VP of generative AI at Meta, stated these claims are false. The rumors surfaced following reports of poor performance on certain benchmarks and the use of an unreleased model version. Al-Dahle acknowledged some variances in model quality across different cloud-hosting environments and reaffirmed Meta's commitment to refining their AI offerings as public access is rolled out.
The rumor that Meta trained its new AI models on test sets to inflate benchmark scores is simply not true, according to Ahmad Al-Dahle, VP of generative AI.
Al-Dahle acknowledged that some users are experiencing mixed quality from Maverick and Scout across different cloud providers, attributing this to ongoing adjustments.
Collection
[
|
...
]