
"There is a clear gap between the theoretical medical knowledge of large language models (LLMs) and their practical usefulness for patients, according not a new study from the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford. The research, conducted in collaboration with MLCommons and other institutions, involved 1,298 people in the UK."
"In the study, one group was asked to use LLMs such as GPT-4o, Llama 3, and Command R to assess health symptoms and suggest courses of action, while a control group relied on their usual methods, such as search engines or their own knowledge. The results showed that the group using generative AI (genAI) tools performed no better than the control group in assessing the urgency of a condition."
Research in the UK with 1,298 participants compared the use of large language models (LLMs) — including GPT-4o, Llama 3, and Command R — against participants' usual methods such as search engines or personal knowledge for assessing health symptoms. One group used LLMs to assess symptoms and recommend actions, while a control group used customary methods. The LLM-enabled group did not outperform the control group in judging urgency and performed worse at correctly identifying medical conditions. The findings indicate a gap between LLMs' theoretical medical knowledge and practical usefulness for patient symptom assessment.
Read at Computerworld
Unable to calculate read time
Collection
[
|
...
]