AI firms must play fair when they use academic data in training
Briefly

Researchers express concerns over how their intellectual property is unrestrainedly utilized in training commercial large language models, emphasizing the urgent need for clear usage boundaries.
There is an ongoing debate about whether the scraping of academic papers for LLM training constitutes copyright infringement or whether it is permitted under existing law exemptions.
With large language models relying heavily on data from scientific papers, the necessity for creators to receive credit and the need for detailed disclosure of training datasets have come to the forefront.
The ambiguity surrounding the legality of using articles and research papers for AI training raises significant questions about intellectual property rights that crucially impact both researchers and tech firms.
Read at Nature
[
]
[
|
]