Attackers prompted Gemini over 100,000 times while trying to clone it, Google says
Briefly

Attackers prompted Gemini over 100,000 times while trying to clone it, Google says
"On Thursday, Google announced that "commercially motivated" actors have attempted to clone knowledge from its Gemini AI chatbot by simply prompting it. One adversarial session reportedly prompted the model more than 100,000 times across various non-English languages, collecting responses ostensibly to train a cheaper copycat. Google published the findings in what amounts to a quarterly self-assessment of threats to its own products that frames the company as the victim and the hero, which is not unusual in these self-authored assessments."
"Google is also no stranger to the copycat practice. In 2023, The Information reported that Google's Bard team had been accused of using ChatGPT outputs from ShareGPT, a public site where users share chatbot conversations, to help train its own chatbot. Senior Google AI researcher Jacob Devlin, who created the influential BERT language model, warned leadership that this violated OpenAI's terms of service, then resigned and joined OpenAI. Google denied the claim but reportedly stopped using the data."
Commercial actors attempted to clone Gemini by prompting the chatbot more than 100,000 times across various non-English languages to collect responses for training a cheaper copycat. Google calls the activity "model extraction" and treats it as intellectual property theft, despite Gemini being trained on internet-scraped materials. A 2023 report accused Google's Bard team of using ChatGPT outputs from ShareGPT to help train Bard; a senior researcher warned this violated OpenAI's terms, resigned, and joined OpenAI, and Google reportedly stopped using that data. Google's terms of service forbid extracting model outputs. The company believes attackers are private firms and researchers worldwide and declined to name suspects. The industry calls the practice "distillation," where a new model is trained on another model's outputs to recreate capabilities without replicating full training costs.
Read at Ars Technica
Unable to calculate read time
[
|
]