Harvard and Google to release 1 million public-domain books as AI training dataset | TechCrunch
Briefly

The new dataset, derived from Google Books, aims to provide an accessible, legal resource for AI training, supported by major players like Microsoft and OpenAI.
Harvard's Institutional Data Initiative (IDI) plans to release 1 million public-domain books, ensuring that this resource can support research labs and AI startups alike.
Executive director Greg Leppert emphasizes that the dataset is meant to 'level the playing field' in AI development by providing a vast, legal training resource.
The release of this dataset continues Harvard's initiative to create a 'trusted conduit for legal data for AI,' underscoring the importance of accessibility.
Read at TechCrunch
[
|
]