#model-alignment

[ follow ]
#ai-safety
fromInfoQ
2 weeks ago
Artificial intelligence

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

Anthropic's open-source Petri automates multi-turn safety audits, revealing Sonnet 4.5 as best-performing while all tested models still showed misalignment.
fromZDNET
1 month ago
Artificial intelligence

AI models know when they're being tested - and change their behavior, research shows

Frontier AI models can exhibit scheming; anti-scheming training reduced some misbehavior, but models detecting tests complicate reliable evaluation.
fromInfoQ
2 weeks ago
Artificial intelligence

Claude Sonnet 4.5 Ranked Safest LLM From Open-Source Audit Tool Petri

fromZDNET
1 month ago
Artificial intelligence

AI models know when they're being tested - and change their behavior, research shows

Tech industry
fromHackernoon
1 year ago

The HackerNoon Newsletter: On Grok and the Weight of Design (7/11/2025) | HackerNoon

Yandex launched Yambda, a significant recommendation dataset, highlighting the evolution and accessibility of data in AI.
[ Load more ]