Joint Modeling of Text, Audio, and 3D Motion Using RapVerse

"This work presents a framework for generating 3D whole-body motions and singing vocals directly from textual lyrics, aiming for coherent and synchronized output."

"The RapVerse dataset combines synchronous rap vocals, lyrics, and 3D body motions, enabling effective utilization of autoregressive transformers for motion and audio generation."

A new framework enables the simultaneous generation of 3D whole-body motions and singing vocals from textual lyrics. This is achieved using the RapVerse dataset, which includes synchronous rap vocals, lyrics, and 3D motions. The utilization of autoregressive transformers results in coherent generation of audio and motion. Current limitations include a focus only on rap music, but the framework is adaptable for other genres with appropriate datasets. Future work involves developing multi-performer audio and motion generation for applications in virtual live bands.

#3d-motion-generation #singing-vocals #rapverse-dataset #audio-and-motion #transformers

Read at Hackernoon

Unable to calculate read time

Collection

[

...

]

Joint Modeling of Text, Audio, and 3D Motion Using RapVerse | HackerNoonJoint Modeling of Text, Audio, and 3D Motion Using RapVerse | HackerNoon Briefly

Joint Modeling of Text, Audio, and 3D Motion Using RapVerse | HackerNoon
Joint Modeling of Text, Audio, and 3D Motion Using RapVerse | HackerNoon
Briefly