Anthropic unveils new framework to block harmful content from AI models

from InfoWorld 2 months ago

Anthropic introduces a new system called Constitutional Classifiers, which employs classifiers trained on synthetic data to protect AI models from jailbreaks. This method, stemming from the Constitutional AI approach used for aligning prior models, establishes clear content guidelines to determine acceptable outputs. The innovation promises to reduce AI misuse and enhance security by mitigating risks such as data breaches, regulatory issues, and reputational harm. Other companies like Microsoft and Meta are also developing similar protective measures, indicating a broader trend as industries grapple with evolving AI threats.

In our new paper, we describe a system based on Constitutional Classifiers that guards models against jailbreaks, filtering the overwhelming majority of jailbreaks with minimal over-refusals.
InfoWorldhttps://www.infoworld.com/article/3816273/anthropic-unveils-new-framework-to-block-harmful-content-from-ai-models.html

These Constitutional Classifiers are input and output classifiers trained on synthetically generated data that help organizations mitigate AI-related risks like data breaches and reputational damage.
InfoWorldhttps://www.infoworld.com/article/3816273/anthropic-unveils-new-framework-to-block-harmful-content-from-ai-models.html

Constitutional Classifiers are based on a process similar to Constitutional AI, relying on a constitution - a set of principles the model is designed to follow.
InfoWorldhttps://www.infoworld.com/article/3816273/anthropic-unveils-new-framework-to-block-harmful-content-from-ai-models.html

As AI adoption accelerates across industries, security paradigms are evolving to address emerging threats, enabling better compliance and data security.
InfoWorldhttps://www.infoworld.com/article/3816273/anthropic-unveils-new-framework-to-block-harmful-content-from-ai-models.html

Read at InfoWorld

#ai-security #constitutional-classifiers #data-protection #ai-regulations #emerging-technologies

Collection

[

...

]

Anthropic unveils new framework to block harmful content from AI modelsAnthropic unveils new framework to block harmful content from AI models Briefly

Anthropic unveils new framework to block harmful content from AI models
Anthropic unveils new framework to block harmful content from AI models
Briefly