CrowdStrike has teamed up with Meta to launch a new open-source suite of benchmarks to test the performance of AI models within an organization's security operations center (SOC). Dubbed , the suite is designed to help businesses sift through a growing mountain of AI-powered cybersecurity tools to help them hone in on one that's ideally suited for their needs. "Without clear benchmarks, it's difficult to know which systems, use cases, and performance standards deliver a true AI advantage against real-world attacks," CrowdStrike wrote in a press release.
Last month, at the 33rd annual DEF CON, the world's largest hacker convention in Las Vegas, Anthropic researcher Keane Lucas took the stage. A former U.S. Air Force captain with a Ph.D. in electrical and computer engineering from Carnegie Mellon, Lucas wasn't there to unveil flashy cybersecurity exploits. Instead, he showed how Claude, Anthropic's family of large language models, has quietly outperformed many human competitors in hacking contests - the kind used to train and test cybersecurity skills in a safe, legal environment.
Project Ire is an AI agent capable of reverse engineering software files to investigate whether they're malicious and analyze their origins, even if they don't match any previously-cataloged threats. Powered by a combination of large language models (LLMs) and specialized cybersecurity analysis tools, the agent is intended to automate classification to ease cybersecurity analyst . In recent tests, Project Ire was exposed to known samples from a database hackers have used for living off the land attacks, alongside harmless Windows drivers.