Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks

from InfoQ 9 months ago

Chameleon uses a single token-based representation for both text and image, setting it apart from models with separate encoders. Its end-to-end training on mixed sequences of text and image data led to superior output preferred by human judges in trials.
InfoQhttps://www.infoq.com/news/2024/06/meta-chameleon-ai/

Meta's Chameleon models, 7B and 34B, pre-trained on four trillion mixed tokens and fine-tuned for alignment, achieved state-of-the-art results in visual question answering and image captioning. The model's fusion approach allowed for joint reasoning over text and image, providing new possibilities for multimodal interaction.
InfoQhttps://www.infoq.com/news/2024/06/meta-chameleon-ai/

Read at InfoQ

#chameleon #mixed-modal-ai #end-to-end-training #state-of-the-art-performance #multimodal-interaction

Collection

[

...

]

Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text TasksMeta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks Briefly

Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks
Meta's Chameleon AI Model Outperforms GPT-4 on Mixed Image-Text Tasks
Briefly