Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

from ZDNET 2 months ago

Anthropic has introduced a new AI safety system called Constitutional Classifiers, which aims to enhance the security of its AI models like Claude. This system utilizes a constitution of principles that guides what content is permissible, effectively filtering out harmful queries and preventing misuse. In initial tests, a dedicated group of researchers attempted to override this system through various jailbreak attempts, but none succeeded. This indicates a significant advancement in AI safety protocols, demonstrating the effectiveness of the Constitutional Classifiers in maintaining model integrity.

The principles define the classes of content that are allowed and disallowed, ensuring a robust AI system that limits harmful information sharing.
ZDNEThttps://www.zdnet.com/article/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system/

The Constitutional Classifiers system proved effective, with 183 red-teamers failing to find a universal jailbreak over two months of testing.
ZDNEThttps://www.zdnet.com/article/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system/

Read at ZDNET

#ai-safety #constitutional-ai #jailbreak-attempts #anthropic #claude

Collection

[

...

]

Anthropic offers $20,000 to whoever can jailbreak its new AI safety systemAnthropic offers $20,000 to whoever can jailbreak its new AI safety system Briefly

Anthropic offers $20,000 to whoever can jailbreak its new AI safety system
Anthropic offers $20,000 to whoever can jailbreak its new AI safety system
Briefly