Anthropic has a 2-hour engineering take-home test. It says its new Claude 4.5 model outscored every human who took it.
Briefly

Anthropic has a 2-hour engineering take-home test. It says its new Claude 4.5 model outscored every human who took it.
"On Monday, the company introduced Claude Opus 4.5 and described it as its most advanced AI model to date, and said that the new model "scored higher than any human candidate ever" on "a notoriously difficult take-home exam" that the company gives prospective engineering candidates. In a blog post on Monday, Anthropic said that the two-hour take-home test is designed to assess technical ability and judgment under time pressure, and though it doesn't reflect all skills an engineer needs to possess,"
"In its methodology, the company said that this result came from giving the model several chances to solve each problem and then picking its best answer. There is not much publicly known information regarding what the engineering test consists of. A 2024 interview review published on Glassdoor said that the test has four levels and asks prospective candidates to implement a specific system and add functionalities to it."
Claude Opus 4.5 is Anthropic's newest AI model and achieved a higher score than any human candidate on a two-hour engineering take-home exam. The exam is intended to assess technical ability and judgment under time pressure, though it does not capture all skills an engineer may need. Anthropic evaluated the model by giving multiple attempts per problem and selecting the best answer. Public information about the exam is limited; a 2024 Glassdoor review describes a four-level test requiring implementing a system and adding functionalities, but it is unclear if that is the same exam. The release also adds improvements to document and spreadsheet generation.
Read at Business Insider
Unable to calculate read time
[
|
]