#fineweb2

[ follow ]
Artificial intelligence
fromInfoQ
1 day ago

Hugging Face Releases FineTranslations, a Trillion-Token Multilingual Parallel Text Dataset

FineTranslations provides over one trillion tokens of English-parallel data across 500+ languages to improve machine translation and supplement English model pretraining.
[ Load more ]