We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65%

"By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models. While the models do tend to agree with humans on extremes like 'impossible,' they diverge sharply on hedge words like 'maybe.' For example, a model might use the word 'likely' to represent an 80% probability, while a human reader assumes it means closer to 65%."

"This could be because humans can interpret words such as 'likely' and 'probable' based more on contextual cues and personal experiences. In contrast, large language models may be averaging over conflicting usages of those words in their training data, leading to divergences with human interpretations."

"Our study also found that large language models are sensitive to gendered language and the specific language used for prompting. When a prompt changed from 'he' to 'she,' the AI's probability estimates often became more rigid, reflecting biases embedded in its training data. When a prompt changed from English to Chinese, the AI's probability estimates often shifted, possibly due to differences between English and Chinese in how people express and understand uncertainty."

Research published in NPJ Complexity reveals significant gaps between how AI language models and humans interpret words expressing uncertainty. While AI models agree with humans on extreme terms like 'impossible,' they diverge substantially on hedge words such as 'maybe' and 'likely.' For instance, an AI might interpret 'likely' as 80% probability while humans typically understand it as approximately 65%. These differences stem from AI models averaging conflicting usages in training data, whereas humans rely on contextual cues and personal experiences. Additionally, AI probability estimates are sensitive to gendered language and linguistic variations, with shifts occurring between English and Chinese prompts, reflecting biases embedded in training data.

#ai-uncertainty-communication #language-model-bias #probability-interpretation #human-ai-alignment #linguistic-variation

Read at Fortune

Unable to calculate read time

Collection

[

...

]

We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | FortuneWe studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune Briefly

We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune
We studied chatbots and language and saw a huge problem: They mean 80% when they say 'likely' but humans hear 65% | Fortune
Briefly