Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back
Briefly

Wikipedia is facing challenges related to AI crawlers that are adversely affecting its server performance, leading to increased costs and reduced accessibility for users. To combat this, the Wikimedia Foundation has collaborated with Kaggle to release a beta dataset tailored for machine learning purposes. This dataset, containing abstracts, infobox data, and more, is structured to facilitate AI training but notably omits references, which may complicate attribution. By making this dataset available, Wikimedia aims to manage the load on its servers while enabling developers to utilize Wikipedia's data effectively.
Wikipedia is offering a structured dataset to AI developers, hoping to mitigate the impact of bots on its servers and improve user experience.
The Wikimedia Foundation, in collaboration with Kaggle, has released a structured dataset for AI training, aimed at alleviating server strain caused by bots.
This dataset includes key information structured for machine learning, but lacks references, raising concerns about attribution for the sourced information.
Wikimedia aims to balance between providing useful data for AI development while addressing performance issues on the Wikipedia site due to bot activity.
Read at Engadget
[
|
]