Copyright Infringement, Unauthorized Scraping & Licensing Disputes
| Plaintiff Type | Defendants | Core Claims | Status (2025) |
|---|---|---|---|
| News organizations (NYT, etc.) | OpenAI, Microsoft | Copyright infringement via scraping paywalled articles for training; outputs reproduce works | Pending in SDNY |
| Authors (Silverman, others) | OpenAI, Meta | Books scraped without permission; ChatGPT outputs infringe | Mixed motions to dismiss rulings |
| Visual artists | Stability AI, Midjourney, DeviantArt | Artwork scraped for Stable Diffusion training; outputs are derivative works | Partially survived motions to dismiss |
| Programmers (GitHub Copilot) | Microsoft, GitHub, OpenAI | Code scraped; Copilot outputs infringing code | Ongoing |
| Music publishers | Anthropic, others | Lyrics reproduced in outputs without license | Pending |
| Rights-Holder Type | Strongest Claims | Evidence Needed |
|---|---|---|
| News publishers / journalists | Copyright infringement (articles); DMCA §1202 (CMI removal); ToS breach (paywall circumvention) | Articles in training datasets; paywall breach evidence; outputs reproducing articles |
| Book authors | Copyright infringement; derivative works | Books in datasets (e.g., Books3 corpus); AI outputs containing passages; registration certificates |
| Visual artists / photographers | Copyright infringement; derivative works; right of publicity (if person depicted) | Images in training sets (LAION, etc.); outputs mimicking style; similarity analysis |
| Musicians / composers | Copyright infringement (compositions, sound recordings) | Music in training data; outputs reproducing melodies/lyrics |
| Programmers / software developers | Copyright infringement (code); license violations (GPL, etc.) | Code repositories scraped; Copilot outputs containing copyrighted code |
| Individuals (privacy claims) | CCPA violations; misappropriation of likeness; privacy torts | Personal data/images in training sets; lack of consent; outputs using likeness |
Challenges for plaintiffs:
Evidence plaintiffs can gather:
| Damage Theory | Calculation | Challenges |
|---|---|---|
| Statutory damages (copyright) | $750–$30k per work ($150k willful) | Requires timely registration; "per work" definition unclear for massive datasets |
| Licensing fees | What you would have charged for AI training license | No established market rates yet; AI companies argue $0 (fair use) |
| Lost market value | AI outputs compete with your work, reducing sales/licensing | Hard to prove causation; substitution effect |
| Unjust enrichment | AI company's profits attributable to using your work | Difficult to trace profits to specific training data |
AI training cases are natural class actions:
| Section | Content |
|---|---|
| Your works & ownership | Identify copyrighted works, registration status, commercial value, market position |
| Evidence of use in training | How you know your works were scraped/used (dataset disclosures, outputs, server logs) |
| Infringement theories | Copyright (copying for training), derivative works (outputs), DMCA §1202, ToS violations |
| Fair use rebuttal | Why training is NOT transformative; commercial use; market harm; non-substitution is false |
| Damages calculation | Statutory damages potential, licensing fees, lost market value |
| Demand | Cease using works in training; remove from datasets; destroy derivative models; licensing negotiation OR litigation |
| Deadline | 30–60 days (longer than typical IP demands given complexity) |
For News Publishers:
For Individual Creators:
For Software Developers:
Four-factor analysis:
| Factor | AI Company Argument | Rights-Holder Counter |
|---|---|---|
| 1. Purpose & character | Transformative: training creates new tool; outputs are new works, not copies | Commercial use; outputs compete with originals; no transformation of individual works |
| 2. Nature of work | Many training works are factual (news, code); less protection | Also includes highly creative works (fiction, art, music); core of copyright |
| 3. Amount used | Entire work needed for training; outputs use minimal amounts | Copied entire works; many outputs substantially reproduce training data |
| 4. Market effect | AI tools complement, don't substitute; new markets created | Direct substitution; users get content without licensing; lost licensing revenue |
Growing trend toward negotiated licenses:
I represent content creators asserting rights against unauthorized AI training and AI companies defending against infringement claims. This emerging area requires understanding both copyright law and AI technology.
Book a call to discuss your AI training dispute. I'll assess the strength of infringement or fair use claims, evaluate litigation vs. licensing options, and recommend strategy for resolution or defense.
Email: owner@terms.law