#ai-training-data

[ follow ]
TechCrunch
1 day ago
Artificial intelligence

AI training data has a price tag that only Big Tech can afford | TechCrunch

Training data is the key to sophisticated AI systems over design or architecture. [ more ]
#openai
Engadget
1 week ago
Artificial intelligence

OpenAI will reportedly pay $250 million to put News Corp's journalism in ChatGPT

OpenAI and News Corp reached a multi-year deal for ChatGPT to train on News Corp's publications. [ more ]
Futurism
1 week ago
Artificial intelligence

The New ChatGPT Has a Huge Problem in Chinese

Pollution of OpenAI's Chinese chatbot data compromised outputs. [ more ]
WIRED
3 weeks ago
Artificial intelligence

OpenAI Offers an Olive Branch to Artists Wary of Feeding AI Algorithms

OpenAI announced Media Manager tool by 2025, allowing creators to control how their work is used in AI training. [ more ]
www.nytimes.com
1 month ago
NYC politics

Como los gigantes tecnologicos toman atajos para obtener datos para la IA

OpenAI developed Whisper tool to transcribe YouTube videos for more AI training data. [ more ]
Entrepreneur
1 month ago
Artificial intelligence

OpenAI May Have Used YouTube Videos for AI Training | Entrepreneur

AI training data can be sourced from YouTube video transcriptions and other platforms.
OpenAI used a tool called Whisper to transcribe YouTube videos for training AI models. [ more ]
Futurism
1 month ago
Artificial intelligence

OpenAI Secretly Trained GPT-4 With More Than a Million Hours of Transcribed YouTube Videos

OpenAI's latest text-to-video generator, Sora, may have been trained using publicly available and licensed data, including transcribed YouTube videos.
AI companies like OpenAI and Google are using murky and potentially copyright-infringing data to train their models, leading to lawsuits and accusations of misattributing practices. [ more ]
moreopenai
Futurism
1 month ago
Artificial intelligence

Synthetic Data, Explained: Why AI Trained on AI Is The Next Big Thing (and Problem)

Synthetic data is viewed as a potential solution to the shortage of AI training data.
Challenges exist in creating quality synthetic data, with current attempts leading to AI model issues. [ more ]
#generative-ai
The Bootstrapped Founder
1 month ago
Bootstrapping

Michael Taylor - Prompt Engineering for Fun & Profit

Generative AI could revolutionize databases and developer roles.
AI systems are constantly being trained by our interactions, raising questions of control and collaboration with machines. [ more ]
ZDNET
2 months ago
Artificial intelligence

Generative AI adoption will slow because of this one reason, according to Gartner

Generative AI enables professionals to focus on more important tasks by delegating menial work.
Generative AI's reliance on internet data for training poses copyright infringement risks, leading to defensive spending. [ more ]
moregenerative-ai
Verdict
2 months ago
Artificial intelligence

BBC in talks to sell archive to tech companies as AI training data

BBC is considering selling access to its content archive for AI training data to diversify revenue streams.
BBC aims to use AI models like GenAI for production applications such as aiding journalists in writing and sourcing stories. [ more ]
#copyright-infringement
Futurism
2 months ago
Artificial intelligence

Microsoft Mocks NYT's AI Lawsuit As "Doomsday Futurology"

The New York Times filed a lawsuit against Microsoft and OpenAI over the use of news articles for AI training. Microsoft and OpenAI responded, claiming the lawsuit is without merit and stressing the transformative nature of using content for language models. [ more ]
ComputerWeekly.com
4 months ago
Artificial intelligence

GenAI tools 'could not exist' if firms are made to pay copyright | Computer Weekly

Using copyrighted content in AI training data is claimed to be fair use
Music publishers are demanding damages from Anthropic for copyright infringement [ more ]
morecopyright-infringement
#reddit
The Verge
3 months ago
Artificial intelligence

Google cut a deal with Reddit for AI training data

Google is partnering with Reddit to access AI training data efficiently.
The collaboration allows Google to utilize Reddit's data API for real-time content and improve search results. [ more ]
Ars Technica
3 months ago
Artificial intelligence

Reddit sells training data to unnamed AI company ahead of IPO

Reddit signed $60 million AI training deal for future IPO
Tech firms are entering licensing deals for AI training data [ more ]
morereddit
Iapp
4 months ago
Artificial intelligence

How to protect your privacy when using a chatbot

Chatbot companies have different policies for storing and using user conversations to train AI.
Privacy professionals advise against sharing sensitive information with chatbots to minimize the risk of hacks or misuse. [ more ]
[ Load more ]