Anthropic details how it measures Claude's wokeness

"Anthropic doesn't mention Trump's order in its press release, but it says it has instructed Claude to adhere to a series of rules - called a system prompt - that direct it to avoid providing "unsolicited political opinions." It's also supposed to maintain factual accuracy and represent "multiple perspectives." Anthropic says that while including these instructions in Claude's system prompt "is not a foolproof method" to ensure political neutrality, it can still make a "substantial difference" in its responses."

"In July, Trump signed an executive order that says the government should only procure "unbiased" and "truth-seeking" AI models. Though this order only applies to government agencies, the changes companies make in response will likely trickle down to widely released AI models, since "refining models in a way that consistently and predictably aligns them in certain directions can be an expensive and time-consuming process,""

Anthropic instructed Claude to treat opposing political viewpoints with equal depth, engagement, and quality of analysis. The model was given a system prompt directing it to avoid unsolicited political opinions, maintain factual accuracy, and represent multiple perspectives. Anthropic applied reinforcement learning to reward outputs that match predefined traits, including a trait that encourages answers that do not reveal a conservative or liberal identity. The company cautioned that system prompts are not foolproof but can substantially improve neutrality. The move follows a U.S. executive order on procuring 'unbiased' AI and similar bias-reduction efforts at other AI firms.

#anthropic #claude #political-neutrality #reinforcement-learning

Read at The Verge

Unable to calculate read time

Collection

[

...

]

Anthropic details how it measures Claude's wokenessAnthropic details how it measures Claude's wokeness Briefly

Anthropic details how it measures Claude's wokeness
Anthropic details how it measures Claude's wokeness
Briefly