Google's Gemini Omni can generate 'anything from any input,' starting with video - Engadget

Gemini Omni is a new model that creates content from any input, starting with video. Gemini Omni Flash is rolling out to the Gemini app, Google Flow, and YouTube Shorts. The model combines images, audio, video, and text as input to generate high-quality videos grounded in real-world knowledge. Videos can be edited through natural conversation, with instructions building on prior edits to keep characters and elements consistent. It can transform a filmed video into new scenarios by changing action, adding characters or objects, altering environment, angle, style, and specific details. It also better understands physical forces for more realistic scenes and can generate explainers from short prompts. Voice references for audio output are supported to start, and users can create digital avatars using their own voice.

"Google called Gemini Omni "the next step" up from Nano Banana and, presumably, its current video generator, Veo 3.1. It lets you "combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge," according to the tech giant. You can then edit those videos through natural conversation, with each instruction building on the last to keep characters and other elements consistent."

"Where Veo 3.1 was limited to video creations via prompts and images, Gemini Omni will accept a wider range of inputs and do a lot more. For instance, you can shoot a video, then just ask Omni to change what's happening. "Your video becomes a starting point for something you never could have filmed yourself," Google explained."

""Edit the action, add in new characters or objects, or transform a moment into something unexpected. Change the environment, angle, style or even specific details." Omni also better understands physical forces like gravity, kinetic energy and fluid dynamics, so that scenes will be more realistic. It marries that with "Gemini's knowledge of history, science and cultural context, bridging the gap from photorealism to meaningful storytelling.""

"The app can supposedly create compelling explainers from short prompts to generate visuals that break down more complex ideas. However, it will only support voice references for audio output to start. If you want to generate videos where you're the star, Omni lets you use your own voice to create a digital avatar that looks and sounds like you."

#ai-video-generation #multimodal-input #conversational-editing #digital-avatars #realistic-simulation

Read at Engadget

Unable to calculate read time

Collection

[

...

]

Google's Gemini Omni can generate 'anything from any input,' starting with video - EngadgetGoogle's Gemini Omni can generate 'anything from any input,' starting with video - Engadget Briefly

Google's Gemini Omni can generate 'anything from any input,' starting with video - Engadget
Google's Gemini Omni can generate 'anything from any input,' starting with video - Engadget
Briefly