DeepSeek has unveiled Janus-Pro, an updated multimodal model that enhances token-based training, data scalability, and performance in text-to-image generation. Featuring distinct visual encoding pathways for understanding and generation, Janus-Pro aims to resolve stability issues while retaining a unified transformer architecture. Notably, it outperforms both existing unified models and some task-specific counterparts, particularly on key benchmarks like GenEval and DPG-Bench. The largest version, Janus-Pro-7B, leverages enhanced training techniques and synthetic aesthetic data, claiming competitive superiority over OpenAI’s DALL-E 3.
DeepSeek's Janus-Pro is a significant advancement in multimodal AI, enhancing text-to-image generation and understanding through improved training strategies and model design.
Janus-Pro separates visual encoding pathways for understanding and generation within a unified transformer architecture, addressing stability issues while ensuring competitive performance compared to task-specific models.
Collection
[
|
...
]