The user study involved 25 participants who annotated 8236 images in a rigorous zero-shot comparison of text-to-image generation, focusing on aesthetics and alignment.
Our comparison method for image quality employed a system where labelers were blind to which image corresponded to each baseline, ensuring a fair evaluation without bias.
Training for the labelers included rigorous testing on a previous set, resulting in only experienced annotators participating in the subjective quality comparisons.
The final review process validated the human preference ratings through careful examination of the side-by-side comparisons made during the user studies.
Collection
[
|
...
]