Using Large Language Models for Zero-Shot Video Generation: A VideoPoet Case Study | HackerNoon
Briefly

"VideoPoet leverages a decoder-only transformer architecture that is adept at processing multiple modalities, allowing it to synthesize high-quality videos from diverse conditioning signals, such as images and text."
"Through extensive experimentation, we benchmark VideoPoet against existing state-of-the-art models, demonstrating its superior performance in generating coherent and contextually relevant video content from various input types."
Read at Hackernoon
[
|
]