OpenAI's latest AI models, o3 and o4-mini, are noted for their advanced capabilities but struggle with hallucination issues, which has become more pronounced than in previous versions. Despite being touted for improvements in coding and mathematical tasks, tests reveal that these models hallucinate information more frequently, specifically in the PersonQA benchmark, where o3 exhibited a 33% hallucination rate. This is a significant increase compared to older models. OpenAI acknowledges the need for further research to understand this anomaly, emphasizing the complexity of resolving hallucination challenges in AI.
OpenAI's new o3 and o4-mini models, despite advancements, exhibit increased hallucination rates, raising concerns about the reliability of AI in generating factual information.
Internal testing indicated that o3 hallucinated in 33% of questions on PersonQA, significantly higher than previous models, showcasing the challenges in reducing hallucinations.
Collection
[
|
...
]