Meta has developed a new text-to-video model called Emu Video which can generate high-quality videos based on text, images, or a combination of both.
Emu Video uses a factorized approach, generating images conditioned on a text prompt and then generating video conditioned on both text and the generated image.
The model was preferred over Meta's previous generative video project by 96% of respondents based on quality and by 85% based on faithfulness to the text prompt.