Apple Open-Sources Multimodal AI Model 4M-21
Briefly

Researchers at Apple and EPFL have unveiled 4M-21, an open-sourced AI model that supports 21 input and output modalities, showing significant performance across various benchmarks.
The performance of 4M-21 out of the box on numerous vision benchmarks illustrates the viability of extensively training a single model on diverse modalities without compromising effectiveness.
4M-21's ability to integrate text, pixel data, and various metadata fosters innovative multimodal interactions, like seamless retrieval and dynamic generation, all executed by one model.
With 21 modalities compared to its predecessor's seven, 4M-21 highlights a major leap in capabilities, demonstrating Apple's commitment to advancing multimodal AI technologies.
Read at InfoQ
[
]
[
|
]