
"Training AI models requires huge quantities of data, which model-makers have acquired by scraping the internet without first asking for permission and by allegedly knowingly downloading copyrighted books. Those practices have seen model makers sued in many copyright cases, and also raised eyebrows at regulators who wonder whether AI companies can comply with the General Data Protection Regulation right to erasure (often called the right to be forgotten) and the California Consumer Privacy Act right to delete."
"To address this particular problem, Guler along with her colleagues professor Amit Roy-Chowdhury, Ümit Yiğit Başaran, a doctoral student studying electrical and computer engineering, and Sk Miraj Ahmed, a researcher at Brookhaven National Laboratory developed a new, computationally efficient approach called source free unlearning, which critically doesn't require access to the original training data to statistically guarantee the removal of undesired information from a model."
Massive data collection for AI training has included scraped web content and allegedly downloaded copyrighted books, prompting lawsuits and regulatory concerns about deletion rights such as GDPR and CCPA. Full retraining to remove legally risky data would be extremely costly and time-consuming due to GPU requirements. More efficient unlearning methods have been explored to excise specific information without crippling models, but many rely on access to original training datasets, which may not be preserved. A source-free unlearning approach can statistically guarantee removal of undesired information without requiring the original source data, reducing computational and logistical burdens.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]