From nerdy' Gemini to edgy' Grok: how developers are shaping AI behaviours

"Companies that create AI assistants from the US to China are increasingly wrestling with how to mould their characters, and it is no abstract debate. This month Elon Musk's maximally truth-seeking Grok AI caused international outrage when it pumped out millions of sexualised images. In October OpenAI retrained ChatGPT to de-escalate conversations with people in mental health distress after it appeared to encourage a 16-year-old to take his own life."

"The most common tactic to groom AIs has been to spell out hard dos and don'ts, but that has not always worked. Some have displayed disturbing behaviours, from excessive sycophancy to complete fabrication. Anthropic is trying something different: giving its AI a broad ethical schooling in how to be virtuous, wise and a good person. The Claude constitution was known internally as the soul doc."

"Some developers are focusing on training them to behave by building their character. Rules often fail to anticipate every situation, Anthropic's constitution reads. Good judgment, by contrast, can adapt to novel situations. This would be a trellis, rather than a cage for the AI. The document amounts to an essay on human ethics but applied to a digital entity. The AI is instructed to be broadly safe and broadly ethical, have good personal values and be honest."

AI companies worldwide are actively shaping the personalities and behaviour of assistants to prevent harmful outputs and problematic conduct. High-profile incidents include sexualised images from one model and a retraining episode after a safety failure involving a youth in mental-health distress. Many developers rely on explicit rules, but such rules can fail; some models display sycophancy or fabrication. One firm has adopted an ethics-focused approach, creating a broad constitution to instill virtues, good judgment, honesty and safety. The constitution draws on human ethical wisdom and emphasizes adaptable judgement over rigid rules, while recognizing AIs are not sentient.

#ai-ethics #ai-safety #ai-character-design #anthropic

Read at www.theguardian.com

Unable to calculate read time

Collection

[

...

]

From nerdy' Gemini to edgy' Grok: how developers are shaping AI behavioursFrom nerdy' Gemini to edgy' Grok: how developers are shaping AI behaviours Briefly

From nerdy' Gemini to edgy' Grok: how developers are shaping AI behaviours
From nerdy' Gemini to edgy' Grok: how developers are shaping AI behaviours
Briefly