Research indicates that large language models, such as GPT-4o and Google's Gemma, exhibit cognitive biases similar to humans. They show a tendency to stick to initial responses, demonstrating a choice-supportive bias that increases confidence in those answers. When presented with conflicting guidance, they display a marked reduction in confidence and a propensity to alter their responses, even when the provided advice is incorrect. This behavior diverges from expected normative Bayesian principles, raising concerns about the reliability of LLMs in multi-turn interactive applications within enterprises.
We show that LLMs - Gemma 3, GPT4o and o1-preview - exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind.
We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating.
#large-language-models #ai-reasoning #confidence-bias #multi-turn-interactions #enterprise-applications
Collection
[
|
...
]