"The main takeaway from this study is that LLMs, while impressive, still lack the depth of understanding required for advanced history. They're great for basic facts, but when it comes to more nuanced, PhD-level historical inquiry, they're not yet up to the task," said Maria del Rio-Chanona, one of the paper's co-authors and an associate professor of computer science at University College London.
"The researchers shared sample historical questions with TechCrunch that LLMs got wrong. For example, GPT-4 Turbo was asked whether scale armor was present during a specific time period in ancient Egypt. The LLM said yes, but the technology only appeared in Egypt 1,500 years later."
"Del Rio-Chanona told TechCrunch that it's likely because LLMs tend to extrapolate from historical data that is very prominent, finding it difficult to retrieve more obscure historical knowledge."
Collection
[
|
...
]