AI models now generate audio from videos using encoded representations and diffusion models.
Quality of audio output depends on video quality, with challenges like lip sync.
DeepMind's new model complements existing video generation models, enhancing audiovisual capabilities.