Models All The Way Down

from Knowingmachines 11 months ago

In December, researchers from Stanford's Internet Observatory identified more than 1,000 images categorized as Child Sexual Abuse Material (CSAM) in one of the most influential AI training sets of the moment: LAION-5B.
Knowingmachineshttps://knowingmachines.org/models-all-the-way

If your full-time, eight-hours-a-day, five-days-a-week job were to look at each image in the dataset for just one second, it would take you 781 years.
Knowingmachineshttps://knowingmachines.org/models-all-the-way

Common Crawl is a corpus of web data that comes from a monthly crawl of the web. It contains data for more than 3 billion websites.
Knowingmachineshttps://knowingmachines.org/models-all-the-way

Pinterest generates the captions on its pages from the ALT tags, so users learned to write them before they 'pinned' their images.
Knowingmachineshttps://knowingmachines.org/models-all-the-way

Read at Knowingmachines

#ai-models #internet-training-sets #ethical-concerns #content-curation #web-data

Collection

[

...

]

Models All The Way DownModels All The Way Down Briefly

Models All The Way Down
Models All The Way Down
Briefly