A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs

"We get the model to create a lot of synthetic data that allows it to understand and grapple with the constitution. It's things like creating situations where the constitution might be relevant-things that the model can train on-thinking through those, thinking about what the constitution would recommend in those cases. Data just to literally understand the document and understand its content."

"I'm dedicating this week's newsletter to a conversation I had with the main author of Anthropic's new and improved "constitution," the document it uses to govern the outputs of its models and its Claude chatbot. Sign up to receive this newsletter every week via email . And if you have comments on this issue and/or ideas for future ones, drop me a line at sullivan@fastcompany.com, and follow me on X"

Anthropic revised its constitution to address growing model capabilities and evolving user risks, aiming to prevent deception and harm. The process uses synthetic data generation to create scenarios that teach the model how the constitution applies. Training includes prompting the model to reason about which responses align with the constitution and then using reinforcement learning to move model behavior toward those responses. Methods involve exposing the model to the full constitution and having it evaluate candidate responses for compliance. The overall goal is to produce model outputs that reliably follow the specified safety and conduct principles.

#anthropic #ai-alignment #reinforcement-learning #synthetic-training-data

Read at Fast Company

Unable to calculate read time

Collection

[

...

]

A Q&amp;A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIsA Q&amp;A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs Briefly

A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs
A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs
Briefly