AI Still Can't Explain a Joke-or a Metaphor-Like a Human Can | HackerNoon
Briefly

This section of the study evaluates large AI models' ability to handle figurative language through human assessment. Expert annotators gauge the quality of AI-generated explanations using a taxonomy to classify errors: hallucination, unsound reasoning, incomplete reasoning, and verbosity. The evaluation process involves reviewing 95 instances where the model predictions are compared against expert interpretations. This approach aims to pinpoint inaccuracies in model explanations and improve understanding of multimodal tasks in AI.
We conduct human evaluation of generated explanations to more reliably assess their quality and identify key errors in multimodal figurative language understanding.
We ask whether the explanation is adequate and if not, we ask them to identify one of the three main types of errors based on the taxonomy we provided.
Hallucination indicates difficulties with visual comprehension, while unsound reasoning violates common logic, and incomplete reasoning fails to address key properties of the image.
The explanations are taken for both correct and incorrect model predictions, allowing for a comprehensive evaluation of model performances.
Read at Hackernoon
[
|
]