Researchers express concerns about the unauthorized use of intellectual property in the training of commercial large language models, highlighting the need for clear rules and agreements.
There is uncertainty regarding the legality of using various data, including academic papers, for training LLMs, with some publishers and organizations advocating for clarification through legal channels.
Data sources for popular LLMs like ChatGPT remain largely undisclosed, with significant quantities of research material possibly being utilized without explicit permission from creators.
The conversation around credit and compensation for the use of academic materials in LLM training is urgent, as the academic community seeks recognition for their contributions.
Collection
[
|
...
]