Is GPT-5 really worse than GPT-4o? Ars puts them to the test.
Briefly

The rollout of OpenAI's GPT-5 model has garnered significant user complaints about its tone, creativity, and increased inaccuracies. In response to the backlash, OpenAI reinstated the previous GPT-4o model as an option for users. Testing both models with a series of prompts revealed distinct differences in their responses. While GPT-5 offered acceptable dad jokes, GPT-4o produced a mix of original and nonsensical jokes. This evaluation highlighted the subjective nature of assessing AI outputs and the varying effectiveness of different models.
When utilizing the new GPT-5 model, users experienced complaints regarding its tone, creativity, and increase in inaccuracies, leading OpenAI to reintroduce GPT-4o as an alternative.
The comparison of GPT-5 and GPT-4o through various prompts illustrates noticeable differences in styles and responses, despite the subjective nature of user evaluations.
In tests, GPT-5 provided acceptable dad jokes that matched the form well, while GPT-4o offered a mix of good and puzzling jokes, indicating varied creative outputs.
Both GPT-5 and GPT-4o displayed disparities in performance across different prompts, highlighting an ongoing debate about the effectiveness and reliability of newer AI models.
Read at Ars Technica
[
|
]