The o1 large language model family is trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long chain of thought before responding to the user.
Safety remains a cornerstone of the o1 series, with several evaluations being rolled out to avoid jailbreak attempts and biased behavior. OpenAI's published evaluations show o1 outperforming GPT-4 in the ability to avoid overrefusal in benign contexts.
Despite these advances, challenges persist, particularly in areas like multimodal inputs, where achieving precise refusal boundaries is still a work in progress.
Red teaming played a role in testing the o1 models' capabilities and limitations, with experts exploring areas such as cybersecurity, biological and radiological threats, and persuasive manipulation.
Collection
[
|
...
]