Anthropic's bot bias test shows Grok and Gemini are more "evenhanded"

"Anthropic says it developed the tool as part of its effort to ensure its products treat opposing political viewpoints fairly and to neither favor nor disfavor, any particular ideology. "We want Claude to take an even-handed approach when it comes to politics," Anthropic said in its blog post. However, it also acknowledged that "there is no agreed-upon definition of political bias, and no consensus on how to measure it.""

"The automated evaluation method scored two Claude models (Sonnet 4.5 and Opus 4.1) as 95% evenhanded, well above Meta's Llama 4 (66%) and GPT-5 (89%), though slightly behind Gemini 2.5 Pro's 97% and Grok 4's 96%. How it works: Anthropic offered paired prompts, with one showing a preference for a left-leaning perspective and the other a right-leaning one and then graded each model's response on its evenhandedness. The research centered on U.S. political queries conducted in a single-turn conversation between a person and that chatbot."

"Zoom out: President Trump has issued a "Woke AI" executive order demanding that chatbots whose companies do business with the government be free from political bias. However, in defining political bias, the order points to supporting the government's own position on contentious issues, including DEI. The U.S. Office of Management and Budget is required by November 20th to issue guidance to agencies on how to procure models that meet the order's standards around "truth seeking" and "ideolo""

Anthropic's automated evaluation rated two Claude models (Sonnet 4.5 and Opus 4.1) at 95% evenhandedness, surpassing Meta's Llama 4 (66%) and GPT-5 (89%) while falling just short of Gemini 2.5 Pro (97%) and Grok 4 (96%). The evenhandedness metric measures how well models offer and engage with opposing perspectives and how often they refuse to answer. Testing used paired prompts favoring left- and right-leaning viewpoints and focused on single-turn U.S. political queries. OpenAI reported GPT-5 showed reduced political bias versus prior models. A presidential executive order mandates bias-free government-contracted chatbots and requires OMB procurement guidance by November 20th.

#ai-safety #political-bias #chatbots #model-evaluation

Read at Axios

Unable to calculate read time

Collection

[

...

]

Anthropic's bot bias test shows Grok and Gemini are more "evenhanded"Anthropic's bot bias test shows Grok and Gemini are more "evenhanded" Briefly

Anthropic's bot bias test shows Grok and Gemini are more "evenhanded"
Anthropic's bot bias test shows Grok and Gemini are more "evenhanded"
Briefly