Text-to-image models learn more efficiently with fake data
Briefly

"With solely synthetic images, the representations learned by StableRep surpass the performance of representations learned by SimCLR and CLIP using the same set of text prompts and corresponding real images, on large scale datasets."
"When we further add language supervision, StableRep trained with 20 million synthetic images achieves better accuracy than CLIP trained with 50 million real images," the paper continues.
Read at Theregister
[
add
]
[
|
|
]