Exclusive: Anthropic wants to pay hackers to find model flaws
Briefly

Bug bounty programs help companies find bugs they may have otherwise missed and provide incentive for hackers to report their findings, rather than exploit them for malicious attacks.
Anthropic is expanding its bug bounty program to test for universal jailbreak attacks, focusing on repeatable model flaws with broad negative consequences, offering rewards up to $15,000.
Current AI models are recognized to be jailbreakable to some extent; Anthropic's initiative aligns with White House commitments for AI system safety and third-party vulnerability reporting.
Experienced security researchers can apply for Anthropic's bug bounty program by August 16, targeting specific model flaws with the potential for wide-ranging harmful outcomes.
Read at Axios
[
]
[
|
]