#multimodal-language-model

[ follow ]
The Verge
1 month ago
Artificial intelligence

Microsoft brings out a small language model that can look at pictures

Microsoft introduced Phi-3-vision, a multimodal language model focusing on both text and image comprehension, specifically for mobile devices. [ more ]
GSMArena.com
1 month ago
Mobile UX

GPT-4o released with improved text, audio and vision capabilities

GPT-4o by OpenAI offers enhanced text, voice, and image capabilities for improved user interaction and responsiveness. [ more ]
The Drum
4 months ago
Artificial intelligence

Weekly AI recap: Google Bard becomes Gemini, Meta ramps up AI labeling efforts

Google has rebranded its AI-powered chatbot Bard to Gemini.
Gemini is a multimodal large language model that can produce content in various formats. [ more ]
[ Load more ]