Leaked docs show how Meta's AI is trained to be safe, be 'flirty,' and navigate contentious topics

"In categorizing user prompts, Meta more strictly monitored tier one prompts involving sensitive issues while allowing tier two prompts some leeway for creative engagement."

"Contracts for contractors specified a need to reject certain flirty prompts if they ventured into sexual explicitness, balancing fun with safety in AI interactions."

Leaked training documents from Scale AI reveal Meta's strategies for ensuring their AI models are both engaging and safe. Contractors evaluating user interactions with Meta's AI chatbot were guided on how to classify prompts, distinguishing between sensitive 'tier one' content, which should be rejected, and more lenient 'tier two' prompts. For instance, flirty but non-explicit interactions were acceptable, while prompts that sexualized minors faced immediate rejection. This careful classification illustrates Meta's approach to balancing user engagement with the imperative of content safety, positioning AI to interact appropriately without crossing ethical lines.

#ai-safety #content-moderation #meta #user-engagement #data-labeling

Read at Business Insider

Unable to calculate read time

Collection

[

...

]

Leaked docs show how Meta's AI is trained to be safe, be 'flirty,' and navigate contentious topicsLeaked docs show how Meta's AI is trained to be safe, be 'flirty,' and navigate contentious topics Briefly

Leaked docs show how Meta's AI is trained to be safe, be 'flirty,' and navigate contentious topics
Leaked docs show how Meta's AI is trained to be safe, be 'flirty,' and navigate contentious topics
Briefly