
"“The newer Mythos Preview checkpoint completed both our cyber ranges, solving the range 'The Last Ones' in 6 of 10 attempts and the previously unsolved 'Cooling Tower' in 3 of 10 attempts,” the blog authors wrote. “This was the first time that a model completed the second of our two cyber ranges.”"
"“When Anthropic first announced Mythos Preview and Project Glasswing -- the cybersecurity testing alliance it formed with rival tech companies and AI labs, to which it gave limited access to Mythos -- last month, UK AISI evaluated it, finding that the model 'represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving.'”"
"“A rapidly accelerati”"
A newer version of Anthropic’s Claude Mythos was tested by the UK AI Security Institute using two cyber ranges. The model completed “The Last Ones” in 6 of 10 attempts and “Cooling Tower” in 3 of 10 attempts. It was the first time a model completed the second cyber range. The updated results outperformed earlier Mythos results and OpenAI’s GPT-5.5 about a month after Mythos’ initial release. The testing also indicates that capability gains can occur within versions of a single model, not only across separate model releases. The findings suggest progress is rapid but not necessarily a purely marketing claim or a catastrophic leap.
#ai-cybersecurity-testing #anthropic-claude #model-capability-improvements #cyber-ranges #llm-benchmarks
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]