#ai-training-data

[ follow ]
openai
WIRED
1 week ago
Artificial intelligence

OpenAI Offers an Olive Branch to Artists Wary of Feeding AI Algorithms

OpenAI announced Media Manager tool by 2025, allowing creators to control how their work is used in AI training. [ more ]
www.nytimes.com
1 month ago
NYC politics

Como los gigantes tecnologicos toman atajos para obtener datos para la IA

OpenAI developed Whisper tool to transcribe YouTube videos for more AI training data. [ more ]
Entrepreneur
1 month ago
Artificial intelligence

OpenAI May Have Used YouTube Videos for AI Training | Entrepreneur

AI training data can be sourced from YouTube video transcriptions and other platforms.
OpenAI used a tool called Whisper to transcribe YouTube videos for training AI models. [ more ]
Futurism
1 month ago
Artificial intelligence

OpenAI Secretly Trained GPT-4 With More Than a Million Hours of Transcribed YouTube Videos

OpenAI's latest text-to-video generator, Sora, may have been trained using publicly available and licensed data, including transcribed YouTube videos.
AI companies like OpenAI and Google are using murky and potentially copyright-infringing data to train their models, leading to lawsuits and accusations of misattributing practices. [ more ]
ReadWrite
1 month ago
Artificial intelligence

OpenAI and Google accused of using YouTube transcripts for AI

Tech giants like OpenAI and Google transcribing YouTube videos for AI training.
Depletion of training data leading to exploration of transcribing diverse sources for AI models. [ more ]
PYMNTS.com
1 month ago
Artificial intelligence

OpenAI and Microsoft Sued by Authors Over AI Training, Again

Authors accusing OpenAI and Microsoft of copyright infringement for using their works in AI training.
Legal battle highlights the growing trend of authors suing tech companies for alleged misuse of their works in AI programs. [ more ]
moreopenai
copyright-infringement
ZDNET
2 months ago
Artificial intelligence

Generative AI adoption will slow because of this one reason, according to Gartner

Generative AI enables professionals to focus on more important tasks by delegating menial work.
Generative AI's reliance on internet data for training poses copyright infringement risks, leading to defensive spending. [ more ]
Futurism
2 months ago
Artificial intelligence

Microsoft Mocks NYT's AI Lawsuit As "Doomsday Futurology"

The New York Times filed a lawsuit against Microsoft and OpenAI over the use of news articles for AI training. Microsoft and OpenAI responded, claiming the lawsuit is without merit and stressing the transformative nature of using content for language models. [ more ]
ComputerWeekly.com
3 months ago
Artificial intelligence

GenAI tools 'could not exist' if firms are made to pay copyright | Computer Weekly

Using copyrighted content in AI training data is claimed to be fair use
Music publishers are demanding damages from Anthropic for copyright infringement [ more ]
morecopyright-infringement
Futurism
1 month ago
Artificial intelligence

Synthetic Data, Explained: Why AI Trained on AI Is The Next Big Thing (and Problem)

Synthetic data is viewed as a potential solution to the shortage of AI training data.
Challenges exist in creating quality synthetic data, with current attempts leading to AI model issues. [ more ]
Verdict
1 month ago
Artificial intelligence

BBC in talks to sell archive to tech companies as AI training data

BBC is considering selling access to its content archive for AI training data to diversify revenue streams.
BBC aims to use AI models like GenAI for production applications such as aiding journalists in writing and sourcing stories. [ more ]
TechCrunch
2 months ago
Artificial intelligence

Are OpenAI's deals with publishers edging out the competition? | TechCrunch

OpenAI partners with French and Spanish news publishers like Le Monde and Prisa Media to bring their content to ChatGPT users.
The Information reported that OpenAI is offering publishers between $1 million and $5 million a year for access to archives to train its AI models. [ more ]
The Verge
2 months ago
Artificial intelligence

Google cut a deal with Reddit for AI training data

Google is partnering with Reddit to access AI training data efficiently.
The collaboration allows Google to utilize Reddit's data API for real-time content and improve search results. [ more ]
Ars Technica
2 months ago
Artificial intelligence

Reddit sells training data to unnamed AI company ahead of IPO

Reddit signed $60 million AI training deal for future IPO
Tech firms are entering licensing deals for AI training data [ more ]
Iapp
3 months ago
Artificial intelligence

How to protect your privacy when using a chatbot

Chatbot companies have different policies for storing and using user conversations to train AI.
Privacy professionals advise against sharing sensitive information with chatbots to minimize the risk of hacks or misuse. [ more ]
The Bootstrapped Founder
1 month ago
Bootstrapping

Michael Taylor - Prompt Engineering for Fun & Profit

Generative AI could revolutionize databases and developer roles.
AI systems are constantly being trained by our interactions, raising questions of control and collaboration with machines. [ more ]
[ Load more ]