AI Training & Data Use Demand Letters

Copyright Infringement, Unauthorized Scraping & Licensing Disputes

AI Training & Copyright – Evolving Legal Landscape
⚠️ Rapidly Evolving Law: AI training copyright law is in flux. Major lawsuits by news organizations, authors, artists, and programmers against OpenAI, Microsoft, Google, Meta, and others are pending (2023-2025). No definitive Supreme Court or appellate rulings yet establish clear standards.
Current Litigation Landscape
Plaintiff Type Defendants Core Claims Status (2025)
News organizations (NYT, etc.) OpenAI, Microsoft Copyright infringement via scraping paywalled articles for training; outputs reproduce works Pending in SDNY
Authors (Silverman, others) OpenAI, Meta Books scraped without permission; ChatGPT outputs infringe Mixed motions to dismiss rulings
Visual artists Stability AI, Midjourney, DeviantArt Artwork scraped for Stable Diffusion training; outputs are derivative works Partially survived motions to dismiss
Programmers (GitHub Copilot) Microsoft, GitHub, OpenAI Code scraped; Copilot outputs infringing code Ongoing
Music publishers Anthropic, others Lyrics reproduced in outputs without license Pending
Plaintiff Legal Theories
  • Direct copyright infringement: Copying entire works into training datasets without authorization
  • Derivative works: AI outputs are derivative works of training data
  • DMCA §1202 violations: Removing or altering copyright management information (CMI) during scraping
  • Misappropriation: Unfair competition, unjust enrichment for commercial use of creative works
  • Privacy violations: Scraping and using personal data without consent (CCPA, other privacy laws)
  • Terms of Service violations: Scraping paywalled or ToS-restricted content
Defense Theories (AI Companies)
  • Fair use: Training is transformative; creates new outputs, doesn't substitute for originals
  • No substantial similarity: Outputs don't reproduce training data (except in rare hallucination cases)
  • Publicly available data: Scraping public internet content is lawful
  • No market harm: AI tools complement rather than substitute for original works
  • Licensing deals: Growing number of licensing agreements with publishers (e.g., OpenAI-News Corp, Google-publishers)
💡 Business Reality: While litigation proceeds, AI companies are increasingly entering licensing deals with major content owners. This creates two-track system: large publishers get paid, individuals/small creators pursue litigation.
Key Open Questions
  • Is training on copyrighted works fair use or infringement?
  • Does scraping alone constitute infringement, or only if outputs reproduce?
  • What level of similarity between output and training data is actionable?
  • Can rights-holders opt out of AI training?
  • Are DMCA circumvention claims viable (bypassing paywalls, robots.txt)?
Claims for Rights-Holders
Who Can Assert Claims?
Rights-Holder Type Strongest Claims Evidence Needed
News publishers / journalists Copyright infringement (articles); DMCA §1202 (CMI removal); ToS breach (paywall circumvention) Articles in training datasets; paywall breach evidence; outputs reproducing articles
Book authors Copyright infringement; derivative works Books in datasets (e.g., Books3 corpus); AI outputs containing passages; registration certificates
Visual artists / photographers Copyright infringement; derivative works; right of publicity (if person depicted) Images in training sets (LAION, etc.); outputs mimicking style; similarity analysis
Musicians / composers Copyright infringement (compositions, sound recordings) Music in training data; outputs reproducing melodies/lyrics
Programmers / software developers Copyright infringement (code); license violations (GPL, etc.) Code repositories scraped; Copilot outputs containing copyrighted code
Individuals (privacy claims) CCPA violations; misappropriation of likeness; privacy torts Personal data/images in training sets; lack of consent; outputs using likeness
Proving AI Training Infringement

Challenges for plaintiffs:

  • Black box problem: Training datasets and model architectures often not publicly disclosed
  • Discovery needed: Requires litigation to compel disclosure of training data sources
  • Probabilistic outputs: Hard to prove specific output is "copy" vs. coincidental similarity

Evidence plaintiffs can gather:

  • Public dataset disclosures: Common Crawl, LAION, Books3 have been documented; check if your works are included
  • Prompting for reproduction: Test AI with prompts designed to elicit your copyrighted work (e.g., "Write article about [topic] in style of [Your Name]")
  • Metadata analysis: Some outputs contain artifacts suggesting training on specific sources
  • Company statements: Public disclosures about training data sources
  • Scraped content logs: Web server logs showing AI company bots scraping your site
Calculating Damages
Damage Theory Calculation Challenges
Statutory damages (copyright) $750–$30k per work ($150k willful) Requires timely registration; "per work" definition unclear for massive datasets
Licensing fees What you would have charged for AI training license No established market rates yet; AI companies argue $0 (fair use)
Lost market value AI outputs compete with your work, reducing sales/licensing Hard to prove causation; substitution effect
Unjust enrichment AI company's profits attributable to using your work Difficult to trace profits to specific training data
Class Action Viability

AI training cases are natural class actions:

  • Common questions: Did defendants scrape/train on works without permission? Is it fair use?
  • Large classes: Millions of creators whose works were scraped
  • Settlement leverage: Class certification creates existential risk for AI companies
  • Opt-out rights: Class members can opt out to pursue individual claims if they have strong damages cases
Drafting AI Training Demand Letters
Strategic Considerations
💡 Licensing vs. Litigation: Many AI companies prefer licensing deals over litigation. If you're a significant content owner, demand letter can be opening to negotiation rather than pure adversarial posture.
  • Individual vs. collective action: Consider joining existing class actions vs. individual demand
  • Publicity: Public demands/lawsuits attract media attention, putting pressure on AI companies
  • Licensing opportunity: Frame as "We're open to licensing our content for AI training at fair rates"
  • Discovery needs: Litigation may be necessary to uncover what training data was used
Letter Structure
Section Content
Your works & ownership Identify copyrighted works, registration status, commercial value, market position
Evidence of use in training How you know your works were scraped/used (dataset disclosures, outputs, server logs)
Infringement theories Copyright (copying for training), derivative works (outputs), DMCA §1202, ToS violations
Fair use rebuttal Why training is NOT transformative; commercial use; market harm; non-substitution is false
Damages calculation Statutory damages potential, licensing fees, lost market value
Demand Cease using works in training; remove from datasets; destroy derivative models; licensing negotiation OR litigation
Deadline 30–60 days (longer than typical IP demands given complexity)
Tone & Positioning
  • Firm but business-oriented: "We recognize AI's potential but demand fair compensation for our creative works"
  • Open to licensing: "We're willing to negotiate reasonable licensing terms for training use"
  • Cite precedent: Reference licensing deals AI companies have made with other publishers
  • Collective strength: If representing multiple creators, emphasize scale of infringement
Special Considerations

For News Publishers:

  • Emphasize paywall circumvention and ToS violations
  • Reference NYT and other publisher litigation as precedent
  • Highlight existing licensing deals (OpenAI-News Corp, etc.) as proof of market value
  • DMCA §1202 claims for copyright management information removal

For Individual Creators:

  • Consider joining class actions rather than individual demands (cost-effective)
  • If pursuing individually, focus on works with clear commercial value and registration
  • Document attempts to opt out (robots.txt, no-scraping notices)

For Software Developers:

  • Open-source license violations (GPL requires attribution/sharing; Copilot doesn't comply)
  • Specific code snippets reproduced in outputs
  • Loss of attribution and credit
⚠️ Fair Use Defense: AI companies will assert fair use. Courts haven't ruled definitively, but transformative use doctrine favors defendants in many cases. Demands should acknowledge uncertainty and offer licensing as win-win alternative.
Sample Demand Letters
Sample 1: News Publisher / Content Creator
[Your Organization Name] [Address] [Email / Phone] [Date] [AI Company Name] Legal Department [Address] Re: Unauthorized Use of Copyrighted Works in AI Training – Demand for Licensing & Compensation Dear [AI Company]: We are writing regarding your unauthorized use of our copyrighted content to train your AI models, including [Model Names - e.g., GPT-4, Claude, Gemini]. OUR COPYRIGHTED WORKS: [Your Organization] is a [description - e.g., news publisher, content platform] that creates and owns copyrighted articles, photographs, videos, and other creative works. We invest millions of dollars annually in journalism and content creation. Our works are protected by copyright and distributed via [website] under clear Terms of Service prohibiting scraping and unauthorized use. EVIDENCE OF YOUR INFRINGEMENT: We have determined that you have scraped and used our copyrighted works without authorization: 1. Dataset Evidence: Public disclosures indicate your models were trained on [Common Crawl, specific datasets] that include scraped content from our website. 2. Output Evidence: When prompted, your AI model [Model Name] reproduces content substantially similar to our copyrighted articles, including [specific examples with prompts and outputs attached]. 3. Server Logs: Our records show your web scrapers accessed our site [number] times between [dates], downloading [amount] of content, including paywalled articles requiring subscription. 4. DMCA Violations: Your scraping removed copyright management information, including author attribution, copyright notices, and publication metadata, violating 17 U.S.C. §1202. LEGAL VIOLATIONS: Your conduct constitutes: 1. Direct Copyright Infringement (17 U.S.C. §501): Copying our works into training datasets without authorization; 2. Creation of Derivative Works (17 U.S.C. §106(2)): Your AI outputs are derivative works incorporating our copyrighted expression; 3. DMCA §1202 Violations: Removing/altering copyright management information; 4. Breach of Terms of Service: Our website ToS expressly prohibits scraping and commercial use without license; 5. Unfair Competition & Unjust Enrichment: Commercial exploitation of our works without compensation. FAIR USE DOES NOT APPLY: You may claim "fair use," but this defense fails: • Purpose: Your use is purely commercial (charging for AI subscriptions/APIs); • Nature: Our works are creative, published works at the core of copyright protection; • Amount: You copied entire articles, photographs, and works; • Market Effect: Your AI outputs compete directly with our content, reducing traffic and subscriptions; users can obtain summaries and information without visiting our site. Courts have not established that AI training is transformative fair use. The question is unsettled, and multiple pending lawsuits (NYT v. OpenAI, etc.) will determine this issue. MARKET FOR LICENSING: You have entered licensing agreements with other publishers (e.g., [list known deals: OpenAI-News Corp, Google-publishers]). This establishes that you recognize the value of content and the need for licenses. We are entitled to equivalent compensation. DEMAND: We demand: 1. Immediate cessation of using our copyrighted works in training any current or future AI models; 2. Removal of our works from all training datasets; 3. Destruction or retraining of models incorporating our copyrighted content; 4. Licensing negotiation: We are open to negotiating a fair licensing agreement for prospective authorized use at commercially reasonable rates comparable to your other publisher deals; 5. Retroactive compensation: $[Amount] representing fair licensing fees for past unauthorized use from [date] to present; 6. Transparency: Disclosure of all instances where our content appears in training data and methodology for removal. TIMELINE: Given the complexity of this matter, we request your substantive response within 60 days. If we cannot reach licensing agreement, we will pursue litigation for copyright infringement, DMCA violations, and related claims, seeking statutory damages of up to $150,000 per work, injunctive relief, and attorney's fees. We prefer a business resolution. Please contact [Contact Name] at [Email/Phone] to discuss licensing terms. Sincerely, [Your Name / Title] [Organization] Enclosures: - Examples of AI outputs reproducing our content - Copyright registrations - Evidence of scraping activity
Sample 2: Individual Author / Artist
[Your Name] [Address] [Email / Phone] [Date] [AI Company] Legal Department [Address] Re: Copyright Infringement – Unauthorized Use of My Works in AI Training Dear [Company]: I am the author/creator of [describe works - e.g., "12 published novels," "portfolio of 500+ digital artworks"]. My works are registered with the U.S. Copyright Office [list registration numbers if available] and commercially available through [publishers, galleries, platforms]. I have recently learned that my copyrighted works were included in training datasets for your AI model [Model Name] without my authorization or compensation. EVIDENCE: 1. My books were included in the "Books3" dataset, which has been publicly documented as part of your training corpus. 2. When prompted with [specific prompts], your AI generates outputs that [reproduce passages from my work / mimic my distinctive artistic style with substantial similarity]. 3. I never granted permission for my works to be used for AI training. I am a working creator who depends on income from my copyrighted works. Your unauthorized use: • Deprives me of licensing revenue; • Allows users to obtain content similar to mine without purchasing my works; • Diminishes the market value of my original works. I am aware of ongoing litigation by authors and artists raising similar claims (Silverman v. OpenAI, Andersen v. Stability AI, etc.). However, I am reaching out directly to you first to seek resolution. DEMAND: 1. Remove my works from all training datasets immediately; 2. Retrain or modify models to exclude outputs derived from my works; 3. Compensate me $[Amount - e.g., $50,000] representing: • Licensing fee for past unauthorized use • Statutory damages exposure ($750–$150k per work × [number] works = potential liability of $[X]) 4. OR: Negotiate ongoing licensing agreement at fair market rates for prospective authorized use. If I do not receive satisfactory response within 30 days, I will join the pending class action litigation [cite case if applicable] OR file individual copyright infringement lawsuit. I am open to discussing reasonable licensing arrangements. Please contact me at [Email/Phone]. Sincerely, [Your Name]
Sample 3: Software Developer (Code Training)
[Your Name] [Address] [Email / Phone] [Date] [AI Company - e.g., Microsoft, GitHub] Legal Department [Address] Re: Copyright Infringement & Open Source License Violations – GitHub Copilot Dear [Company]: I am a software developer and owner of copyrighted source code repositories hosted on GitHub. My code is licensed under [GPL-3.0 / MIT / Apache 2.0 / other license]. Your AI coding assistant, GitHub Copilot, was trained on my copyrighted code without complying with my license terms. INFRINGEMENT: 1. My repositories [list repo names] containing [number] lines of copyrighted code were scraped and used to train Copilot. 2. Copilot generates code suggestions that reproduce substantial portions of my copyrighted code, including [specific examples]. 3. Copilot outputs do NOT include required attribution, license notices, or comply with copyleft requirements (if GPL). This violates: • My copyright (17 U.S.C. §106); • My open-source license terms (GPL/MIT/etc. require attribution and license compliance); • DMCA §1202 (removal of copyright management information - my license headers). Open-source does NOT mean "free for any use." My GPL license requires that derivative works also be open-sourced and attributed. Copilot violates this by incorporating my code into proprietary suggestions without attribution. DEMAND: 1. Remove my repositories from Copilot training data; 2. Implement filtering to prevent Copilot from suggesting code derived from my works; 3. Compensate me $[Amount] for past violations; 4. OR: Comply with my license terms (provide attribution, comply with copyleft). I am aware of ongoing Doe v. GitHub Copilot litigation and may join as plaintiff or file individual suit if this is not resolved within 30 days. Sincerely, [Your Name]
Defense Strategies & Fair Use
AI Company Defenses (If You Receive a Demand)
📥 Received an AI Training Demand? If you're an AI company or received claim about training on copyrighted works, several defenses are available, though law is unsettled.
Fair Use Defense – 17 U.S.C. §107

Four-factor analysis:

Factor AI Company Argument Rights-Holder Counter
1. Purpose & character Transformative: training creates new tool; outputs are new works, not copies Commercial use; outputs compete with originals; no transformation of individual works
2. Nature of work Many training works are factual (news, code); less protection Also includes highly creative works (fiction, art, music); core of copyright
3. Amount used Entire work needed for training; outputs use minimal amounts Copied entire works; many outputs substantially reproduce training data
4. Market effect AI tools complement, don't substitute; new markets created Direct substitution; users get content without licensing; lost licensing revenue
Other Defenses
  • No substantial similarity: Outputs don't reproduce copyrighted expression; only rare "memorization" cases show copying
  • No market harm: Plaintiffs can't show lost sales/licenses caused by AI training
  • Implied license: Publicly posting content online creates implied license for certain uses
  • First sale doctrine: AI company lawfully acquired copies (e.g., purchased books) and can use for training
  • Statute of limitations: Claims accrued when training occurred (3 years for copyright)
Responding to Demands
  • Evaluate claim strength: Is plaintiff's work actually in training data? Can they prove it?
  • Discovery burden: Plaintiffs need litigation to compel disclosure of training data; expensive for individuals
  • Settlement vs. licensing: Consider whether licensing deal is cheaper than litigation
  • Collective approach: Industry-wide licensing standards emerging; individual deals may set precedent
  • Policy advocacy: Support legislative solutions creating AI training exemptions or compulsory licensing
⚠️ Litigation Risk: Even if you have strong fair use arguments, litigation is expensive ($1M–$10M+ for complex AI cases) and outcomes are uncertain. Early settlement or licensing may be more cost-effective than fighting to establish precedent.
Emerging Licensing Models

Growing trend toward negotiated licenses:

  • Publisher deals: OpenAI-News Corp, Google-AP, etc. - typically $X million annually for training access
  • Opt-in registries: Some platforms allow creators to register works for AI training for compensation
  • Collective licensing: CMOs (collective management organizations) for music/publishing could administer AI licenses
  • Statutory licensing: Possible future legislation creating compulsory licenses (like music mechanical licenses)
Attorney Services for AI Training Disputes
AI Training Copyright Dispute?

I represent content creators asserting rights against unauthorized AI training and AI companies defending against infringement claims. This emerging area requires understanding both copyright law and AI technology.

For Rights-Holders (Creators, Publishers)
  • Evaluate whether your works were used in AI training (dataset analysis, output testing)
  • Draft demand letters and licensing proposals to AI companies
  • Negotiate licensing agreements for authorized AI training use
  • Join or initiate class action litigation
  • File individual copyright infringement lawsuits when damages justify
  • Pursue DMCA §1202 claims for CMI removal
  • Assert ToS violations and breach of contract claims
For AI Companies / Tech Platforms
  • Assess fair use and other defenses to training-based infringement claims
  • Respond to demand letters and evaluate settlement vs. litigation
  • Negotiate licensing agreements with content owners
  • Defend copyright infringement and class action lawsuits
  • Advise on training data sourcing and documentation
  • Implement opt-out mechanisms and respect robots.txt / no-scraping signals
  • Develop industry-standard licensing frameworks
Why Specialized Counsel Matters
Cutting-Edge Legal Issues: AI training copyright law is unsettled, with billion-dollar stakes. Cases require understanding both technical aspects of ML training and nuanced copyright doctrines (transformative use, substantial similarity, derivative works). Generic IP counsel may lack AI-specific expertise.
Representative Matters
  • News publisher claims against LLM developers
  • Book author and artist class actions
  • Software developer Copilot disputes
  • Licensing negotiations for AI training data
  • Fair use defense in AI training cases
  • DMCA §1202 CMI removal claims
  • Privacy-based claims (CCPA, BIPA) for facial recognition / personal data training
Schedule a Call

Book a call to discuss your AI training dispute. I'll assess the strength of infringement or fair use claims, evaluate litigation vs. licensing options, and recommend strategy for resolution or defense.

Contact Information

Email: owner@terms.law

Frequently Asked Questions
Unsettled. AI companies argue it's transformative fair use (training creates new tool, outputs are new works). Rights-holders argue it's commercial copying with market substitution effect. No definitive appellate or Supreme Court ruling yet. Ongoing cases (NYT v. OpenAI, authors/artists v. various AI companies) will establish precedent. Early district court rulings are mixed—some claims survive motions to dismiss, suggesting fair use is fact-question for trial, not automatic defense.
Difficult without litigation discovery. Check: (1) Public dataset disclosures—Common Crawl, Books3, LAION have documented contents; (2) Test outputs—prompt the AI with requests related to your work and see if outputs reproduce or mimic your expression; (3) Server logs—check if AI company scrapers accessed your site; (4) Company statements—some AI companies disclose training sources generally. Definitive proof often requires litigation to compel disclosure of training datasets.
Technically, yes; practically, limited. Methods: (1) robots.txt file with AI scrapers disallowed (some AI companies honor this, many don't); (2) ToS prohibiting scraping/AI training; (3) Paywalls/authentication (though some scrapers bypass); (4) Platforms like DeviantArt offer "no AI training" tags; (5) Copyright notices stating "No AI training permitted." However, enforcement requires litigation if companies ignore opt-outs. Growing movement toward opt-out registries and "Do Not Train" standards.
Depends on your damages and leverage: Join class if: (1) You're individual creator with modest claims; (2) Class action already certified (free to participate, no legal fees); (3) You want accountability without personal litigation burden. Pursue individual if: (1) You're major publisher/rights-holder with substantial licensing revenue at stake; (2) Your works have provable high value and you have negotiating leverage; (3) You can secure individual licensing deal worth more than class action recovery. Many major publishers (NYT, others) are pursuing individual suits/licensing rather than joining classes.