OpenAI v. New York Times stopped being “just” a copyright case the moment the court turned to your ChatGPT logs.
OpenAI v. New York Times: When Your ChatGPT Logs Become Evidence
How a copyright lawsuit turned into a 20-million-chat discovery order—and what it means for AI privacy
A comprehensive audio breakdown of the legal implications and privacy concerns
What began as a copyright dispute over training data has morphed into a landmark test of how far civil discovery can reach into AI chat histories—and what happens when a platform’s privacy promises collide with a federal court order.
Judge orders OpenAI to stop deleting ChatGPT logs, indefinitely preserving all output data across Free, Plus, Pro, and Team tiers
Court orders OpenAI to produce 20 million de-identified ChatGPT chat logs to the New York Times and other plaintiffs
OpenAI files for reconsideration, arguing the order is a privacy disaster and fishing expedition affecting millions of uninvolved users
Can a federal court order override explicit user deletion requests and privacy promises?
Are 20 million “de-identified” chats truly anonymous, or can they be re-identified using context clues?
The gap between “your chats are private” messaging and the legal reality of court-ordered disclosure
Courts treat AI chats as discoverable business records, not privileged communications like attorney-client conversations
OpenAI had to:
- Suspend normal deletion practices for months
- Preserve chats users explicitly deleted
- Temporarily suspend EU users’ “right to erasure” under GDPR Article 17
- Prepare to hand over millions of conversations to opposing counsel under protective order
This case establishes that AI chat logs are subject to standard civil discovery rules at unprecedented scale. Even with protective orders and de-identification, millions of user conversations became litigation evidence—a new risk profile for any AI platform or heavy AI user.
What started as a fight over training data is now a live test of three things:
- how far civil discovery can reach into AI chat histories,
- how much protection “anonymization” and protective orders really provide, and
- what it means when a platform’s privacy promises collide with its own Terms of Use and a federal court order.
Why this case became about chat logs at all 🧠💬
The underlying suit is straightforward enough: The New York Times alleges that OpenAI and Microsoft used millions of Times articles without permission to train GPT models, and that the models can regurgitate Times content in infringing ways. Judge Sidney Stein largely let the case proceed in April 2025, rejecting most of OpenAI’s motion to dismiss and finding the Times had plausibly alleged copyright infringement. (Reuters)
Once the case entered discovery, plaintiffs pushed for output evidence – not just training data – to show how the models behave in the wild. That’s where user chats come in:
- plaintiffs argued that real-world prompts and outputs might show systematic “regurgitation” of Times content,
- OpenAI argued the request is massively overbroad, turns millions of uninvolved users into collateral damage, and conflicts with its privacy and deletion practices.
From there, you get two critical orders: a preservation order and a production order.
The preservation order: “keep everything” 🔒
On May 13, 2025, Magistrate Judge Ona Wang ordered OpenAI to “preserve and segregate all output log data that would otherwise be deleted on a going-forward basis until further order of the Court.” (cdn.arstechnica.net)
Plain English:
- OpenAI had to stop deleting ChatGPT output logs under its normal policies,
- including chats users had explicitly deleted or that privacy laws would otherwise require OpenAI to erase,
- for essentially the entire active user base across Free, Plus, Pro, and Team tiers (enterprise / zero-data-retention customers were carved out).
OpenAI publicly objected, calling the order a privacy “overreach” that “abandons long-standing privacy norms” and asking the district judge to vacate it.
A few key points for your analysis:
| 🧩 Preservation Order Feature | 🧐 Why it matters |
|---|---|
| Indefinite retention of all output logs | Directly conflicts with OpenAI’s prior commitment to delete chats (including “deleted” ones) after ~30 days. (The Verge) |
| Includes data covered by “right to erasure” | OpenAI explicitly told EU users it was temporarily suspending erasure rights under GDPR Art. 17(3)(b) because of the court order. (Reddit) |
| Exempts Enterprise / zero-retention deals | Confirms that what you negotiated in your B2B paper can materially change your exposure when your vendor gets sued. |
By October 22, 2025, OpenAI reported that its obligation to retain all consumer content indefinitely under that particular order ended on September 26, 2025 – but only after months of extraordinary retention. (OpenAI)
The preservation battle set the stage for the next, sharper fight: production.
The 20-million-chat production order 🧾➡️🕵️
In November 2025, Judge Wang went a step further: she ordered OpenAI to produce 20 million de-identified ChatGPT chat logs to the Times and other “News Plaintiffs.” (Ars Technica)
The timeline, in short:
- Oct 30, 2025: plaintiffs ask for a sample of 20M consumer logs;
- Nov 7, 2025: Wang grants the request, requiring production of the 20M de-identified logs by Nov 14, under an existing protective order;
- Nov 12, 2025: OpenAI files a letter and motion for reconsideration, arguing the order is a fishing expedition and a privacy disaster;
- Nov 13, 2025: Wang denies a stay (at least initially); the deadline stands. (PPC Land)
OpenAI’s position:
- 99.99% of the 20M logs are irrelevant to whether ChatGPT regurgitates Times content, based on plaintiffs’ own earlier concessions about prevalence;
- even with de-identification, log content is “deeply personal” and in some cases security-sensitive;
- the court didn’t wrestle sufficiently with proportionality, privacy, or the adequacy of anonymization. (Reuters)
The Times’ position:
- everything is anonymized;
- all data is covered by a strict protective order and security protocols;
- this is routine large-scale sampling to test OpenAI’s assertions about how often its outputs track Times content. (Business Insider)
Judge Wang’s order explicitly relies on those safeguards: production is de-identified and subject to a protective order with access controls, logging, and limitations on use. (Ars Technica)
So, procedurally, you now have:
| ⚖️ Discovery Step | 📌 Status as of late Nov 2025 |
|---|---|
| Mass preservation of logs | Ordered in May 2025; later limited, but long enough to change OpenAI’s retention behavior for months. (Nelson Mullins Riley & Scarborough LLP) |
| Production of 20M logs | Ordered Nov 7; OpenAI is seeking reconsideration and reversal, arguing privacy and proportionality. (Reuters) |
Whatever happens on reconsideration, the precedent is already out there: a federal court ordered an AI provider to pull 20 million user conversations and hand them to an opposing party under a protective order.
That is the new discovery risk profile.
How “anonymous” are 20 million chats, really? 🕶️
From the court’s and the Times’ perspective:
- logs are de-identified (OpenAI scrubs PII, passwords, other obvious sensitive fields), and
- a protective order restricts use to the litigation, limits who can see the data, and requires a secure environment. (OpenAI)
From privacy and cybersecurity commentators’ perspective, this is a lot squishier:
- modern re-identification research routinely shows that “anonymous” text can be tied back to individuals by combining unique facts, writing style, and external data;
- 20M conversations practically guarantee linkable quirks (unique events, dates, locations) that can point back to real people;
- the more parties and experts holding copies, the larger the attack surface for a very juicy dataset. (National Law Review)
OpenAI has leaned heavily on this line of argument in its public messaging, saying the order:
- “disregards long-standing privacy protections” and
- “breaks with common-sense security practices” by forcing disclosure of highly personal chats of people unrelated to the Times. (OpenAI)
The Times, for its part, has publicly countered that:
- OpenAI’s own Terms already allow use of chats for training and legal purposes;
- the logs they’re asking for are anonymized and locked down; and
- accusing them of trying to “invade user privacy” is fear-mongering given the protective order. (Business Insider)
That clash is exactly where Terms of Use comes in.
Where OpenAI’s own Terms of Use and privacy promises fit in 📜🔍
OpenAI markets ChatGPT as privacy-respecting: users can delete chats, opt out of training, and (for some tiers) enjoy zero data retention. (OpenAI)
But its Terms of Use and privacy language also include the familiar back door every SaaS product has:
“We may preserve or disclose your information if we believe it is reasonably necessary to comply with a law, regulation, legal process, or governmental request.” (Reddit)
So, on paper:
- OpenAI promises deletion and limited retention unless a legal process (like this lawsuit) says otherwise;
- it explicitly reserves the right to preserve and disclose user data to comply with court orders.
The discovery orders are precisely that “legal process.” From a pure contract perspective, OpenAI is doing exactly what almost every privacy policy in the industry says: we’ll delete, except when the law says we can’t.
The friction is in the gap between messaging and reality:
| 🎭 Promise / Perception | 🧱 Litigation Reality |
|---|---|
| “Delete means delete; your chats are gone in 30 days.” (The Verge) | “Delete means we mark them for deletion — unless a federal judge orders indefinite preservation and mass production.” (cdn.arstechnica.net) |
| “We fight for your privacy against overbroad requests.” (OpenAI) | “We lost on preservation; we’ve lost (so far) on producing 20M logs; protective orders stand in for actual secrecy.” (Ars Technica) |
| “Your chats are private, like talking to a lawyer or a doctor.” (Altman’s ‘AI privilege’ rhetoric) (TechRadar) | Conversations with AI are not privileged; courts treat them as discoverable business records subject to subpoenas and broad civil discovery. (National Law Review) |
Legally, the ToS gives OpenAI room to comply with court orders. Reputationally, those same orders highlight that:
- “right to deletion” is conditional,
- “privacy” is bounded by discovery, and
- AI chats are not special in the way attorney–client or doctor–patient communications are.
You can already see plaintiffs and regulators using OpenAI’s marketing language against it in parallel privacy investigations (e.g., Italy’s €15M fine over data-use and transparency issues). (Reuters)
TOS, discovery, and privilege: a few takeaways for practitioners ⚙️
If you’re advising companies that either run AI tools or use them heavily, this case is a ready-made playbook.
If a judge ordered 20 million of your AI logs…
This flowchart walks through what actually happens when a court order collides with your vendor’s “we delete your chats” messaging — and why your prompts are now treated like email, Slack, and server logs for discovery.
Employees, customers, and founders feed everything into chat.
You route day-to-day work, experiments, and even sensitive questions through a consumer or “business” AI interface. Typical contents include:
- Draft contracts, demand letters, and negotiation plans
- HR scenarios and performance notes about named employees
- Internal financials, pricing strategies, and confidential road maps
- Customer data, bespoke edge cases, and production screenshots
Prompts and outputs sit in centralized log storage.
Behind the chat UI, prompts and outputs become structured log records with timestamps, session IDs, and internal user identifiers. The vendor’s boilerplate says:
- Logs are kept for debugging, abuse detection, and product analytics.
- Unless you are on an enterprise/zero-retention plan, they can be used for training and model improvement.
- There is a scheduled deletion window (for example, 30 days or similar).
A third party sues and discovery targets model behavior.
A publisher, regulator, or class of users alleges misuse of data and asks how the AI behaves in the wild. They want evidence from real prompts and outputs, not just training data:
- “Show us how often your model regurgitates our content.”
- “Show us how you handle personal data across real-world conversations.”
- “Demonstrate that you aren’t systematically biased.”
A judge instructs the vendor to stop deleting logs.
The court issues an order to “preserve and segregate” output logs that would otherwise be deleted. For a period of time:
- Scheduled deletion is paused for affected logs.
- “Deleted” chats are effectively frozen instead of being purged.
- Any right-to-erasure promises are subordinated to the litigation hold.
- Enterprise / zero-retention instances may be carved out by contract.
The vendor is ordered to hand over a massive “anonymized” dataset.
Under the existing protective order, the court approves a large-scale sampling of logs (for example, 20 million conversations). The vendor:
- Scrubs obvious identifiers, passwords, and some PII.
- Retains enough text to test the plaintiffs’ allegations.
- Delivers data into a secure review environment controlled by outside counsel and experts.
Your prompts are now sitting in someone else’s case file.
The vendor’s logs, containing your conversations, are now in the hands of:
- Opposing counsel and their expert teams
- Third-party vendors providing hosting, analytics, and search tools
- Potentially other courts in related or follow-on proceedings
Your own risk posture now depends on what you wrote into those prompts and whether your paper with the vendor actually carved you out of the consumer firehose.
1. Treat AI chat logs as first-class ESI, not a side channel
The orders in NYT v. OpenAI are a loud signal that:
- courts will treat AI prompts/outputs exactly like emails, Slack, and server logs for discovery purposes;
- preservation and production scopes can reach millions of records if that’s what proportionality analysis supports. (huntress.com)
If your clients route sensitive work through consumer AI tools, assume those chats are discoverable and persistent, regardless of front-end UX.
2. Align your own privacy promises with worst-case litigation
OpenAI is hardly unique; most SaaS privacy notices say some version of:
- “we delete or minimize,” and
- “we keep and disclose when required by law.”
For your own products, the safer drafting posture is to:
- make the “except when law requires otherwise” caveat extremely explicit,
- describe in plain language what happens if a preservation or production order hits, and
- avoid over-selling deletion or “privacy like a lawyer” analogies that will look bad in a discovery fight. (OpenAI)
3. Segment commercial customers and negotiate real data-governance terms
One of the most practical lessons from the preservation order is that Enterprise and zero-retention deals were carved out. (The Verge)
For corporate users, that means:
- insist on clear B2B terms around retention, deletion, and litigation holds,
- aim for separate data stores and legal entities where possible, and
- understand exactly which buckets of your data sit in the same place as the consumer firehose that might get swept into an NYT-style order.
4. Don’t rely on “AI privilege”
Sam Altman’s rhetorical push for “AI privilege” is interesting policy advocacy, but today’s courts are treating AI chats as ordinary third-party communications – not privileged and often inconsistent with maintaining privilege over the underlying subject matter. (TechRadar)
If your client feeds confidential or work-product-laden material into a third-party AI platform:
- you should assume that content can later be discoverable against your client, the vendor, or both;
- non-waiver strategies (e.g., enterprise instances, contractual limits, private deployments) become crucial if you want any credible argument that privilege was preserved.
The broader lesson: discovery rules didn’t suddenly change, but the scale did 📊
Nothing in NYT v. OpenAI changes the black-letter rules of civil discovery. Relevance, proportionality, and protective orders were already there.
What’s new is:
- the volume of conversational data centralized in one vendor;
- the court’s willingness to order preservation and sampling at that scale; and
- the way that order instantly collided with carefully crafted privacy narratives and ToS language.
From a corporate-law and product-counsel perspective, the case is less about who’s right on copyright and more about this question:
“If a judge made our AI vendor preserve everything and hand over 20 million logs tomorrow, would our ToS, privacy policy, and internal data-governance posture survive that stress test?”
That’s the question to bake into your next NDA, DPA, or AI-usage policy review—before someone else’s discovery demand answers it for you.