AI Models Are Getting Smarter. New Tests Are Racing to Catch UpAI developers may not fully grasp their systems' capabilities at first, requiring evaluations to explore limits.