Comparing local large language models for alt-text generation

from Dri 5 months ago

I tested 12 LLMs, including 10 local models and 2 cloud-based, to generate alt-text for images on my blog, where 9,000 images lack descriptions. Though I diligently added text to new images over the last five years, many older ones remain without appropriate descriptions. Cloud models like GPT-4 and Claude Sonnet 3.5 were the most accurate. Local models, such as Llama variants and MiniCPM-V, performed well overall but sometimes lacked detail. Determined to fix this backlog, I plan to share my insights in future posts.

Alt-text is crucial for visually impaired users; despite my ongoing efforts over the last five years, 9,000 images on my blog still lack it.

The benchmark for alt-text generation is set by cloud models like GPT-4 and Claude Sonnet 3.5, which performed notably well, achieving A-grade performance.

Local models like Llama variants and MiniCPM-V performed reliably but received a B grade for occasionally missing important details in alt-text generation.

Given the extensive backlogs many are facing with alt-text, I aim to share my findings on effective AI alt-text solutions in future blog posts.

Read at Dri

#ai #alt-text-generation #accessibility #llms #image-descriptions

Collection

[

...

]

Comparing local large language models for alt-text generationComparing local large language models for alt-text generation Briefly

Comparing local large language models for alt-text generation
Comparing local large language models for alt-text generation
Briefly