In a recent court admission, Google revealed that it can still utilize data from publishers who opted out of AI training through a workaround involving its search organization. While DeepMind offers an option to exclude certain publisher data from training, the search division can access this data to enhance its AI products, specifically the Gemini chatbot. This controversial approach allows Google to amass a significant volume of AI training data, raising ethical concerns about data usage without consent.
"Once you take the Gemini and put it inside the search org, the search org has the ability to train on the data that publishers had opted out of training, correct?" "Correct - for use in search." - Eli Collins
"An internal document from 2024 cited by Aguilar showed that Google had collected a total of 160 billion tokens - short units of text - in AI training data... those 80 billion tokens are still being used to train AI at Google, just not at DeepMind itself."
"There is one way to opt-out of having your website trawled by an AI: by opting out of being indexed in Google's search engine entirely. That's a death sentence for any website."
"Google implies this is merely a consequence of how the widely used 'robots.txt' file works, which instructs web crawlers..."
Collection
[
|
...
]