A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion

"The RapVocal subset consists of 108.44 hours of quality English rap vocals with aligned lyrics, obtained through data crawling using Spotdl and Spotipy."

"Utilizing Spleeter, we separate rap vocals from background music, normalizing loudness for synthesized singing vocals derived from clean data."

"For the Rap-Motion subset, which has 26.8 hours of data, we divided the overall collection pipeline into stages for clarity and efficiency of research."

"The RapVerse dataset is structured into two primary subsets, Rap-Vocal for vocal data and Rap-Motion for motion data, each tailored to various research needs."

RapVerse consists of a rap music motion dataset with synchronized singing vocals, textual lyrics, and full-body human motions. It includes two subsets: Rap-Vocal and Rap-Motion. The Rap-Vocal subset offers 108.44 hours of high-quality rap vocals paired with lyrics, utilizing data crawling methods like Spotdl and Spotipy for collection. It ensures data quality through cleaning and proper vocal separation using Spleeter. The Rap-Motion subset comprises 26.8 hours of motion data, designed for a variety of research applications. The dataset aims to facilitate various studies in rap music and related fields.

#rap-music #dataset #vocal-data #motion-data #music-research

Read at Hackernoon

Unable to calculate read time

Collection

[

...

]

A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion | HackerNoonA Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion | HackerNoon Briefly

A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion | HackerNoon
A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion | HackerNoon
Briefly