#vision-language-models

[ follow ]
Artificial intelligence
fromComputerworld
5 days ago

Microsoft researchers develop new tech for video AI agents

Microsoft is developing MindJourney, a video-AI framework that explores 3D spaces using world models, VLMs, video generation, and reasoning to predict surroundings and movement.
Philosophy
fromTheregister
2 weeks ago

Vision AI models see optical illusions when none exist

Vision language models, like GPT-5, misinterpret simple images as complex illusions, reflecting a form of cognitive bias similar to humans.
Artificial intelligence
fromHackernoon
1 year ago

Researchers Push Vision-Language Models to Grapple with Metaphors, Idioms, and Sarcasm | HackerNoon

The V-FLUTE dataset enhances understanding of figurative language in AI, assessing the performance of vision-language models.
Artificial intelligence
fromHackernoon
1 year ago

Can AI Understand a Joke? New Dataset Tests Bots on Metaphors, Sarcasm, and Humor | HackerNoon

Large AI models struggle with figurative language, which presents challenges due to its implicit meanings.
#idefics2
Bootstrapping
fromHackernoon
55 years ago

The Artistry Behind Efficient AI Conversations | HackerNoon

The cross-attention architecture exceeds fully autoregressive models in vision-language performance, despite having a higher computational cost.
#machine-learning
Artificial intelligence
fromPyImageSearch
3 months ago

Content Moderation via Zero Shot Learning with Qwen 2.5 - PyImageSearch

Digital platforms face complex challenges in content moderation due to user-generated content growth.
Qwen 2.5 models can enhance content moderation through advanced multimodal understanding.
[ Load more ]