In evaluating Chameleon, we focus on tasks requiring text generation conditioned on images, particularly image captioning and visual question-answering, with results grouped by task specificity.
Our work identifies 12 important aspects in real-world deployments of text-to-image generation models, including alignment, quality, aesthetics, reasoning, bias, and efficiency.