OpenAI's new AI safety tools could give a false sense of security | Fortune
Briefly

OpenAI's new AI safety tools could give a false sense of security | Fortune
"OpenAI last week unveiled two new free-to-download tools that are supposed to make it easier for businesses to construct guardrails around the prompts users feed AI models and the outputs those systems generate. The new guardrails are designed so a company can, for instance, more easily set up contorls to prevent a customer service chatbot responding with a rude tone or revealing internal policies about how it should make decisions around offering refunds, for example."
"And, while OpenAI says it has released these security tools for the good of everyone, some question whether OpenAI's motives aren't driven in part by a desire to blunt one advantage that its AI rival Anthropic, which has been gaining traction among business users in part because of a perception that its Claude models have more robust guardrails than other competitors."
OpenAI released two free-to-download classifier tools intended to help businesses construct guardrails around user prompts and AI outputs. The tools, named gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, are designed to assess whether prompts and model outputs meet configurable rules. Companies previously needed to collect examples of policy-violating content and retrain classifiers, a time-consuming and costly process. OpenAI aims to make rule enforcement faster and more flexible. Some security experts warn that the release could introduce new vulnerabilities and create a false sense of security. Observers also question whether the release seeks to blunt Anthropic's competitive advantage.
Read at Fortune
Unable to calculate read time
[
|
]