#benchmark-testing

[ follow ]
Artificial intelligence
fromTechCrunch
5 days ago

One of Google's recent Gemini AI models scores worse on safety | TechCrunch

Gemini 2.5 Flash scores lower on safety tests compared to Gemini 2.0 Flash, raising concerns about AI safety compliance.
#openai
Artificial intelligence
fromZDNET
4 months ago

OpenAI's o3 isn't AGI yet but it just did something no other AI has done

OpenAI's o3 model demonstrates significant adaptability, scoring 76% on the ARC-AGI benchmark, indicating a promising advance in AI capabilities.
fromTechCrunch
2 weeks ago
Artificial intelligence

OpenAI's o3 AI model scores lower on a benchmark than the company initially implied | TechCrunch

OpenAI's o3 model benchmark results are disputed, raising questions about transparency and testing practices.
Artificial intelligence
fromZDNET
4 months ago

OpenAI's o3 isn't AGI yet but it just did something no other AI has done

OpenAI's o3 model demonstrates significant adaptability, scoring 76% on the ARC-AGI benchmark, indicating a promising advance in AI capabilities.
fromTechCrunch
2 weeks ago
Artificial intelligence

OpenAI's o3 AI model scores lower on a benchmark than the company initially implied | TechCrunch

OpenAI's o3 model benchmark results are disputed, raising questions about transparency and testing practices.
more#openai
[ Load more ]