Leading AI makers at odds over how to measure "responsible" AI
Briefly

"AI models behave very differently for different purposes," Nestor Maslej, editor of the 2024 AI Index from Stanford University's HAI, told Axios.
Developers' appetite for responsibility testing varies widely with some like Meta benchmarking models against multiple tests, while others like Mistral's 7B are not benchmarked at all.
Current benchmarks focus on specific areas such as assessing honesty in answers (Truthful QA) or detecting hate speech (RealToxicityPrompts, Toxic Gen).
"There's a clear lack of standardization, but we don't know why," HAI's Maslej told Axios, adding that cherry-picking benchmarks could be one reason for the variability.
Read at Axios
[
add
]
[
|
|
]