"We presented several privacy-preserving options to The Times, including targeted searches over the sample ( e.g., to search for chats that might include text from a New York Times article so they only receive the conversations relevant to their claims), as well as high-level data classifying how ChatGPT was used in the sample. These were rejected by The Times,"
"Immediate production of the output log sample is essential to stay on track for the February 26, 2026, discovery deadline. OpenAI's proposal to run searches on this small subset of its model outputs on Plaintiffs' behalf is as inefficient as it is inadequate to allow Plaintiffs to fairly analyze how "real world" users interact with a core product at the center of this litigation."
"Plaintiffs cannot reasonably conduct expert analyses about how OpenAI's models function in its core consumer-facing product, how retrieval augmented generation ("RAG") functions to deliver news content, how consumers interact with that product, and the frequency of hallucinations without access to the model outputs themselves."
"protected under legal hold, meaning it can't be accessed or used for purposes other than meeting legal obligations,"
OpenAI stored a 20 million-chat random sample of ChatGPT conversations from December 2022 to November 2024, excluding business-customer chats. OpenAI offered privacy-preserving alternatives, including targeted searches of the sample and high-level usage classifications, which were rejected by The Times. The chats remain in a secure system protected under legal hold and are restricted to legal-obligation uses; OpenAI says it will oppose any public release. The New York Times filed that OpenAI refuses to produce model outputs and says immediate production is essential for discovery and expert analysis of model behavior, retrieval-augmented generation, user interaction, and hallucination frequency.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]