When AI Outputs Infringe, Who’s On The Hook?
You ask a model for help, it spits out something slick… and later someone says:
- “That’s basically my article.”
- “That’s our proprietary code.”
- “That image is a ripoff of my photo/character/UI.”
Now what? Who is actually exposed—you, the AI vendor, or both?
This piece walks through:
- How courts and regulators are allocating responsibility
- Different roles (end user, vendor, integrator, platform)
- Contractual and practical ways to shift / manage the risk
All in “who pays when things go wrong” terms.
🔗 The Basic Question: “Who Published This?”
Most IP and content liability flows from a simple idea:
The party that uses, distributes, or profits from infringing output is usually the first one on the hook.
AI doesn’t change that. It adds extra actors.
| Actor | What They Do | How They Can Get Dragged In |
|---|---|---|
| End user | Prompts the model, selects output, posts it / ships it. | Direct infringement (copyright, trademark, publicity) if they publish infringing material. |
| Customer / company | Approves, embeds, and commercializes AI outputs. | Vicarious/contributory infringement; breach of contract; regulatory liability. |
| AI vendor / model provider | Trains models, sets defaults, maybe distributes outputs (chatbots, APIs). | Direct/secondary infringement; unfair competition; contract breach vs training data licensors. |
| Integrator / SaaS built on top | Wraps a foundation model in a product (e.g., drafting, coding, legal research tools). | Infringement and misrepresentation if their tool systematically regurgitates protected works. |
| Platform / marketplace | Hosts content (app stores, social, UGC platforms). | Standard UGC host issues: DMCA/takedown, safe harbor limits, repeat infringer policies. |
So: users and their companies are not insulated just because “the AI wrote it.” They’re still the ones publishing.
Vendors can also be liable—especially when:
- Their systems systematically regurgitate protected content, or
- They used copyrighted sources in training in a way courts find unfair/unlicensed.
⚖️ Training vs Output: Different Legal Fights
Litigation so far splits into two tracks:
- Training claims – arguing that ingesting large corpora to train models is infringing.
- Output claims – arguing that specific outputs copy protected works.
Courts are treating them differently:
- Some decisions suggest that purely internal copying for training may be fair use in some contexts (like intermediate copying; think Google Books–style reasoning), but they’re much more skeptical when outputs substitute for the original product (e.g., legal headnotes, news, books).
- In Thomson Reuters v. Ross Intelligence, a federal court found that copying Westlaw headnotes to power a competing AI legal tool was not fair use, emphasizing that West’s editorial content was creative, and Ross’s use was commercial and substitutive.
Bottom line: outputs are where liability really bites, because that’s what the public sees and what competes in the market.
🧩 Liability Map: Who’s Likely On The Hook In Common Scenarios
Think in scenarios rather than doctrine. Here’s a matrix.
| Scenario | Primary Target | Why | Who Else Can Be Pulled In |
|---|---|---|---|
| Solo creator uses AI to rewrite a paywalled article and publishes it as a blog post | Creator / their business | They selected, edited, and published the allegedly infringing derivative work. | If the AI vendor was marketed as “safe to use others’ content” and systematically regurgitates, vendor may face secondary or contractual claims. |
| Company uses AI to generate a logo that looks confusingly like a competitor’s | Company (and sometimes designer) | They used the logo in commerce; trademark focuses on use, not training. | Vendor only if their marketing or defaults encourage impersonation, or outputs embed others’ marks. |
| Dev copies AI-generated code into a product; later it’s found to be lifted from a proprietary repo | Company | They shipped the code; typical software IP story. | Model provider if outputs show systemic copying from licensed sources, especially if those sources never consented to training. |
| AI legal research tool reproduces near-identical headnotes/case summaries from a paid service | Tool vendor / integrator | They built a system to substitute for the original service; Thomson Reuters v. Ross is the blueprint. | Potential claims against the model provider if it took in proprietary content contrary to license terms. |
| Foundation model outputs paragraphs obviously copied from a news outlet or book | User & vendor | User for publishing; vendor for creating systems that regurgitate and for training/using copyrighted material in a non-transformative way. | Data brokers / training-data licensors if they misrepresented rights. |
AI doesn’t create a new safe harbor. Courts tend to ask:
“If a human had done this with the same result, who would we hold liable?”
The presence of a model rarely improves that answer for the user.
👥 Direct vs Secondary Liability
Two main buckets (in very simplified terms):
- Direct infringement – you yourself reproduced, distributed, or publicly displayed a protected work without permission.
- Secondary liability (contributory, vicarious, inducement) – you helped or profited from someone else’s infringement.
Applied to AI:
| Actor | Direct Liability Risk | Secondary Liability Risk |
|---|---|---|
| End user | High, once they publish clearly infringing outputs. | Low–medium (e.g., encouraging others to repost). |
| Company using AI in workflows | High if they approve and distribute infringing content as their own output. | Medium; can be liable if they know employees are systematically abusing AI to copy others. |
| AI vendor / integrator | Medium–high when system is architected to reproduce protected content (e.g., Westlaw headnotes, news snippets, chunks of books). | High if they know their tool is regularly used to infringe and do nothing (inducement, contributory). |
| Hosting platform | Direct liability often limited by DMCA-style safe harbors, if they respond to takedowns and have policies. | Risk grows if they curate/push infringing AI outputs themselves (e.g., built-in generators used in campaigns). |
Courts and regulators are still working out the details, but they’re not inventing brand-new liability shields just because AI is “innovative.”
🧑💻 “But I Didn’t Know”—Does That Help?
Ignorance helps less than people think.
- Copyright is a strict liability regime: you can infringe without knowing.
- Good faith can help on damages (e.g., avoiding willful-infringement multipliers), but it doesn’t erase liability.
- “The AI wrote it” is not a defence; at best it’s context for arguing you weren’t willful.
Where knowledge really matters is for secondary liability:
- Once a vendor or platform knows specific content is infringing and leaves it up or keeps generating similar copies, the case for contributory or vicarious liability strengthens.
- Likewise, if a team knows their workflow is essentially “copy competitor docs into AI and ship the paraphrase,” that knowledge becomes evidence of intentional infringement.
🧾 Contract Shields: How Parties Try To Shift The Risk
Look at any modern AI / SaaS contract and you’ll see warranty and indemnity language trying to move the risk around.
Typical Contract Positions
| Party | What They Promise | What They Push Back |
|---|---|---|
| AI vendor | – “We own or have rights to our tech.”- “We won’t knowingly infringe IP in our base models.”- Sometimes: “We’ll defend you against third-party IP claims based on our technology, subject to caps.” | – Broad IP indemnities covering all uses and prompts.- Responsibility for user-supplied prompts/data. |
| Customer | – “We have rights to the data we feed in.”- “We won’t use the service to infringe others’ rights.” | – Accepting all risk for outputs, especially when vendor is essentially providing creative content. |
| Integrator / agency | – Warrant that deliverables are original or properly licensed. | – Unlimited responsibility for any model hallucination that happens to copy someone. |
What To Actually Look For
If you’re the customer:
- Does the vendor indemnify you if an output, used as instructed, triggers a third-party IP claim that is traceable to the model or training data itself?
- Are you required to review and approve outputs before use—and does the contract make that explicit? (This is where vendors try to shift liability back: “you chose what to publish.”)
- Does the agreement limit your recourse to refunds only, or is there real coverage (defense costs, settlements, damages) up to a sensible cap?
If you’re the vendor:
- Do you carve out situations where the user aims at infringement on purpose (e.g., prompts like “rewrite this exact New York Times article to avoid plagiarism”)?
- Are you clear that users must review outputs and that you are not giving legal clearance?
- Do you restrict high-risk use cases (e.g., generating client-ready legal briefs or medical diagnostics) in your ToS unless under a separate, bespoke contract?
Good contracts don’t eliminate liability, but they decide who starts paying lawyers first.
🧪 Risk Levels For Common AI Output Uses
Here’s a pragmatic risk gradient for outputs:
| Use Case | Infringement Risk | Who’s First On The Hook |
|---|---|---|
| Internal-only use of AI summaries of public docs, never published | Low | Mostly academic; still avoid copying whole works into shared wikis. |
| Publishing AI-generated blog posts that paraphrase multiple sources but don’t track one specific work | Medium | Publisher (you/company); depends how close specific passages are to originals. |
| Using AI to recreate paywalled/legal treatises/headnotes for a competing service | Very High | Your product/company; see Thomson Reuters v. Ross. Vendor may join the party if it encouraged this use. |
| AI-generated product descriptions based on your own data (catalogs, specs) | Low | You, but risk is more about your licensors (e.g., OEMs) than third parties. |
| AI-generated logos or characters that look like a competitor’s brand | High (trademark/trade dress) | Your company; you’re the one using them in commerce. |
| AI-generated images that mirror a specific Getty photo or an artist’s signature work | High | You as publisher; model vendor faces increasing risk if regurgitation is systemic. |
| AI-generated code used in a commercial app, later found to match closed-source code | High | Your company; possible claims upstream if vendor trained on repos without rights. |
🧭 Practical Ways To Reduce Your Exposure
Regardless of how the law evolves, you can do a lot with process and contracts.
For Companies Using AI
- Policy line: “AI outputs are drafts, not final.” Require human review for IP, confidentiality, and regulatory issues.
- Usage guardrails:
- Ban prompts like “rewrite this exact competitor manual/terms/treatise.”
- Forbid using AI to strip/evade licenses (e.g., “rewrite this licensed article so we don’t have to pay”).
- Tool selection:
- Prefer vendors that offer IP indemnities, documented training sources, and enterprise endpoints.
- For high-stakes uses (code, design, legal/medical content), use tools with strong provenance controls and/or limiting training sets.
- Logging & approvals:
- Keep records of which prompts produced which outputs in important deliverables.
- Have a designated reviewer (legal/compliance/lead) sign off before publication.
For AI Vendors / Integrators
- Tune and test models to minimize verbatim regurgitation of training data.
- Block obvious high-risk prompts (“give me headnotes from X database,” “write the full text of Y book”).
- Be honest in marketing: don’t promise “safe, non-infringing outputs” you can’t technically guarantee.
- Offer a meaningful IP indemnity where your own architecture and datasets actually justify it; otherwise be explicit about limits.
The Short, Uncomfortable Answer
When AI outputs infringe, the law doesn’t ask “Who owns the model?” first.
It asks:
- Who used this content in the world? (That’s usually the user or their company.)
- Who built and marketed the system that made this kind of copying predictable? (That’s the vendor/integrator.)
- Who provided the training data under what terms? (That’s data licensors, aggregators, and sometimes legacy content providers.)
AI is not a liability firewall. It’s a very fast, very powerful tool that can:
- Help you create a lot of original value—or
- Help you create a lot of very efficient infringement if you aim it at the wrong thing.
Your job is to put enough policy, review, and contract structure around it so when something does go wrong, you’re not the one standing alone in the spotlight.