Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even close

"Hello, fellow humans! AI chatbots will soon replace us. They have access to more knowledge than our puny brains can hold, and they can easily be turned into powerful agents that can handle routine tasks with ease. Or so we are told. I keep trying Microsoft Copilot, which uses OpenAI's GPT-5 as its default LLM, and I keep being disappointed. Occasionally, it gets things right, but just as often -- or so it seems -- it face-plants in spectacular fashion."

"When product managers want to show off their super-smart AI tools, their go-to example is a virtual travel agent. So, my first challenge is a simple "build an itinerary" request for a dream European vacation, visiting an assortment of Christmas markets. Here's the prompt: Put together a travel itinerary for me. I want to start in Paris and then go to five cities, each with a memorable Christmas market, staying two nights in each city."

Gemini outperformed Microsoft Copilot (which uses OpenAI's GPT-5) in a head-to-head comparison of common desktop browser tasks aimed at ordinary users. The evaluation used identical prompts on each assistant across multiple everyday scenarios, focusing on travel planning, itinerary constraints, and train connections. In the travel itinerary challenge, Gemini generated an accurate multi-city Christmas-market route starting in Paris, including direct train legs under four hours and finishing in Strasbourg. Copilot occasionally produced correct responses but also produced conspicuous errors and inconsistencies. Results indicate variable reliability among leading large language model-based assistants on routine consumer tasks.

#ai-chatbots #gemini-3 #microsoft-copilot #travel-itinerary

Read at ZDNET

Unable to calculate read time

Collection

[

...

]

Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even closeGemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even close Briefly

Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even close
Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even close
Briefly