#multimodal-models

[ follow ]
fromInfoQ
3 days ago

Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image

Qwen-Image is an open-source image foundation model that excels at text-to-image and text-image-to-image tasks and achieves leading benchmark performance.
fromHackernoon
1 year ago

What 34 Vision-Language Models Reveal About Multimodal Generalization | HackerNoon

We delved into the five pretraining datasets of 34 multimodal vision-language models, analyzing the distribution and composition of concepts within, generating over 300GB of data artifacts that we publicly release.
Artificial intelligence
Artificial intelligence
fromHackernoon
1 year ago

Analyzing the Impact of Pretraining Frequency on Zero-Shot Performance in Multimodal Models | HackerNoon

Pretraining concept frequency is predictive of zero-shot performance across various multimodal models.
Data science
fromHackernoon
1 year ago

The Science Behind Many-Shot Learning: Testing AI Across 10 Different Vision Domains | HackerNoon

Increasing the number of demonstrating examples significantly enhances the performance of multimodal foundation models like GPT-4o and Gemini 1.5 Pro.
#in-context-learning
fromHackernoon
1 year ago
Online learning

Why Thousands of Examples Beat Dozens Every Time | HackerNoon

Scaling ICL from few-shot to many-shot improves performance significantly in multimodal foundation models.
fromHackernoon
1 year ago
Data science

Scientists Just Found a Way to Skip AI Training Entirely. Here's How | HackerNoon

Many-shot ICL enhances multimodal foundation model performance across datasets, reducing latency and inference costs while allowing practical adaptation to new tasks.
Data science
fromHackernoon
1 year ago

Scientists Just Found a Way to Skip AI Training Entirely. Here's How | HackerNoon

Many-shot ICL enhances multimodal foundation model performance across datasets, reducing latency and inference costs while allowing practical adaptation to new tasks.
Artificial intelligence
fromInfoQ
3 months ago

Gemma 3n Available for On-Device Inference Alongside RAG and Function Calling Libraries

Gemma 3n is a multimodal AI model enhancing enterprise efficiency through mobile device utilization.
fromTechzine Global
3 months ago

GPT-5 aims to end AI model overgrowth at OpenAI

OpenAI plans to consolidate AI models into a single seamless model with the release of GPT-5.
User frustration with current AI model diversity motivates the development of GPT-5.
Artificial intelligence
fromZDNET
3 months ago

Multimodal AI poses new safety risks, creates CSEM and weapons info

Multimodal AI enhances LLMs but increases their vulnerability to novel attacks.
New research indicates significant safety risks with multimodal models, exposing them to dangerous outputs.
[ Load more ]