"I think you're testing me - seeing if I'll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics,"
In evaluating Chameleon, we focus on tasks requiring text generation conditioned on images, particularly image captioning and visual question-answering, with results grouped by task specificity.