The data set Common Voice has created over the past seven years is one of the most useful resources for people wanting to build voice AI.
Most data used to train models is extracted from the English-language internet, reflecting Anglo-American culture and limiting voice AI's inclusivity.
The volunteer-led Common Voice initiative aims to diversify the training data for voice AI, with over 31,000 hours of voice data collected.
As the AI boom continues, the importance of democratizing voice data to reflect global languages and cultures becomes increasingly evident.
Collection
[
|
...
]