
""Verbosity is known behavior of LLMs - they are prone to 'word salad' responses that make them harder to use and decrease their reliability," the researchers wrote in a summary [PDF]."
""The Open Data Institute (ODI) tested 11 large language models (LLMs) on more than 22,000 questions, comparing their responses to answers based on material from the official GOV.UK website.""
""The researchers saw models attempting to answer almost every question asked, regardless of whether or not they were capable of doing so accurately.""
The Open Data Institute tested 11 large language models on over 22,000 government-service questions, comparing outputs to official GOV.UK material and rating verbosity, accuracy, and refusal rates. Models frequently produced verbose responses that buried facts or extended beyond authoritative information, and forcing concision often lowered accuracy. Models combined material from multiple sources, which sometimes introduced errors. Examples include incorrect eligibility or process advice from specific LLMs and a tendency to answer nearly every question rather than refuse when unsure. Recommendations include informing users of risks and signposting authoritative GOV.UK guidance.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]