AI can successfully summarize sports events and interviews but struggles with visual context.
Gemini depends on audio for understanding video content, missing visual details.
Inconsistent performance observed in different video types, revealing strengths and weaknesses.