I tested 12 LLMs, including 10 local models and 2 cloud-based, to generate alt-text for images on my blog, where 9,000 images lack descriptions. Though I diligently added text to new images over the last five years, many older ones remain without appropriate descriptions. Cloud models like GPT-4 and Claude Sonnet 3.5 were the most accurate. Local models, such as Llama variants and MiniCPM-V, performed well overall but sometimes lacked detail. Determined to fix this backlog, I plan to share my insights in future posts.
Alt-text is crucial for visually impaired users; despite my ongoing efforts over the last five years, 9,000 images on my blog still lack it.
The benchmark for alt-text generation is set by cloud models like GPT-4 and Claude Sonnet 3.5, which performed notably well, achieving A-grade performance.
Local models like Llama variants and MiniCPM-V performed reliably but received a B grade for occasionally missing important details in alt-text generation.
Given the extensive backlogs many are facing with alt-text, I aim to share my findings on effective AI alt-text solutions in future blog posts.
Collection
[
|
...
]