The Science Behind Many-Shot Learning: Testing AI Across 10 Different Vision Domains | HackerNoon
Briefly

This article investigates the effects of increasing demonstrating examples on the performance of two advanced multimodal foundation models, GPT-4o and Gemini 1.5 Pro, utilizing ICL (in-context learning) frameworks. Various models and datasets are benchmarked to observe performance trends across vision domains. The study conducts ablation tests to analyze the impact of query batching and reveals substantial improvements in zero-shot conditions. Results indicate that many-shot ICL approaches can efficiently enhance model abilities in medical QA tasks among others.
We benchmark their performance using standard performance metrics as well as an ICL data efficiency metric on 10 datasets spanning several vision domains and image classification tasks.
The specific endpoint for GPT-4o is "gpt-4o-2024-05-13", for GPT-4(V)-Turbo is "gpt-4-turbo-2024-04-09", and for Gemini 1.5 Pro is "gemini-1.5-pro-preview-0409".
Read at Hackernoon
[
|
]