AI In NDAs: How To Stop Your Secrets From Becoming Training Data

Published: December 5, 2025 • AI, NDA

Everyone is feeding documents into AI tools now:

  • Sales teams drop pitch decks into chatbots to “polish the copy.”
  • Engineers paste logs into coding copilots.
  • Lawyers and vendors use AI to draft emails, specs, and responses.

Classic NDAs were written for a world where “disclosure” meant a human looking at a document. They did not anticipate:

  • Third-party AI vendors holding prompts and files,
  • Logs and telemetry containing sensitive snippets,
  • Model training that turns your data into part of someone’s global system.

This is where AI-aware NDAs come in: same NDA skeleton, but with explicit rules around where confidential information may be sent and whether it can be used for model training, analytics, or “service improvement.”

Below is a practical guide (with “visuals” in the form of matrices and tables) you can drop almost directly into your drafting playbook.


Why AI Breaks Old NDA Assumptions

Traditional NDAs quietly assume:

  • The recipient’s people read the information,
  • Maybe they store it on their own systems,
  • They don’t redistribute it beyond a tight circle of need-to-know humans.

Once AI is involved, that breaks in at least four ways:

⚠️ RiskWhat Actually HappensWhy The Old NDA Fails
Third-party model accessRecipient pastes your confidential info into a consumer AI or SaaS that logs and processes it.Old NDA rarely treats this as a sub-processor or third-party disclosure.
Cross-tenant trainingAI vendor uses your info to train global models serving other customers.NDA never contemplated your data morphing into “weights” used for competitors.
Long-tail loggingPrompts and outputs (with real names, prices, incidents) live in logs and analytics beyond the contract term.Old NDA has no concept of LLM logs or vector indexes as “copies.”
Residuals + embeddingsIndividuals and models retain embedded knowledge of your material even after deletion.Residual clauses, if any, refer to human memory, not model memory.

So the NDA has to do three things:

  1. Decide if AI can be used at all.
  2. If yes, define where confidential info may go and under what conditions.
  3. Define what happens with training, logs, embeddings, and residuals.

Visual Matrix: NDA Variants For The AI Era

This matrix gives you a quick way to think about flavors of NDAs, depending on how nervous the disclosing party is and how much the recipient relies on AI internally.

NDA VariantAI Use By RecipientModel Training On Disclosing Party Data“Residuals” (people & models)Typical Use CaseRisk For Disclosing Party
Classic NDA (no AI language)Silent. AI use is a gray area; recipient assumes they can use general tools.Silent. Vendor’s default ToS might allow training.Maybe generic human-memory residuals, nothing about models.Legacy forms, low-sophistication parties, no one thinking about AI.❌ High: secrets can leak into third-party AI with no clear breach.
AI-Prohibited NDA“Recipient shall not input Confidential Information into any generative AI or machine learning system except as expressly approved in writing.”Prohibited. No training, no “service improvement,” no embeddings outside recipient’s systems.Human residuals allowed (or not); model residuals banned.Sensitive M&A, litigation, trade-secret tech, regulated data.✅ Very low leakage risk; ❌ hard on recipients who rely on AI.
AI-Guardrails NDA (Preferred Modern Default)AI tools permitted only if (a) enterprise-grade, (b) under DPA/no-training terms, (c) listed as subprocessors.Tenant-local fine-tuning or feature-level models allowed; cross-customer training banned unless separately agreed.Human residuals allowed; model residuals allowed only within recipient’s tenant and for duration of NDA.Most B2B SaaS, vendor evaluations, commercial partnerships.⚖️ Balanced: protects against global training while allowing practical AI use.
AI-Collaboration NDA (Joint Lab)Both parties may use collaborative AI platforms and may jointly define training datasets.Jointly defined training allowed on agreed corpora; rights to resulting models / weights shared or licensed.Residuals recognized and allocated; maybe shared model IP.Joint R&D, co-development of AI tools, strategic partnerships.Depends on negotiation; can be high upside but complex.
Vendor-Friendly AI NDAAI use broadly allowed, including third-party tools “in the ordinary course of business.”Training allowed on de-identified / aggregated data; sometimes even identified data with broad “service improvement” language.Residuals for both humans and models; vendor may keep learned patterns indefinitely.Data-hungry AI/SaaS vendors, click-through online NDAs.🔴 High: your confidential data can become part of vendor’s product.

For most serious commercial work, you want to move away from “Classic” and either:

  • Use AI-Prohibited for truly sensitive matters; or
  • Use AI-Guardrails as your default “we’re modern but careful” NDA.

Clause Building Blocks: How To Make An NDA AI-Aware

Below is a clause-design matrix you can treat as a menu. Pick the column that fits the deal.

1. Definition Of Confidential Information

TopicConservative (Discloser-Friendly)BalancedPermissive (Recipient/Vendor-Friendly)
AI referenceIncludes “any derived data, embeddings, model weights, logs, and outputs generated from or incorporating Confidential Information.”Includes “derived data, embeddings, and indexes created solely for providing the Services to Discloser.”Silent on derived artifacts; only original documents explicitly covered.

2. Permitted Uses & AI Tools

TopicConservativeBalancedPermissive
Use of AI tools“Recipient shall not input Confidential Information into any generative AI, LLM, or similar system.”“Recipient may use AI tools only if they: (a) are enterprise-grade; (b) contractually prohibit training on Customer Data; (c) are listed as authorized subprocessors.”“Recipient may use AI and machine learning tools in the ordinary course of its business.”

3. Third-Party Subprocessors / Model Providers

TopicConservativeBalancedPermissive
SubprocessorsNo third-party AI vendors unless pre-approved in writing by Discloser.Third-party AI vendors allowed if bound by written terms at least as protective and “no training” obligation.“Recipient may engage third-party providers” with broad, generic confidentiality language.

4. Training & “Service Improvement”

TopicConservativeBalancedPermissive
Training use“Recipient shall not use Confidential Information to train, fine-tune, or otherwise improve any machine learning or AI model for the benefit of Recipient or any third party.”“Training limited to models used exclusively within Recipient’s tenant / environment for providing the Services to Discloser; no cross-customer use.”“Recipient may use de-identified and/or aggregated information for analytics, machine learning, and service improvement.”

5. Residuals

TopicConservativeBalancedPermissive
Human residualsOptional: allow individuals to use unassisted memory but not specific docs.Standard “residual knowledge” carve-out with explicit carve-out for trade secrets.Broad residuals: both humans and models can “remember and reuse” patterns.
Model residualsExplicitly disallowed: “No right to retain weights or embeddings trained on Confidential Information beyond the term.”Model residuals permitted inside tenant for the term; must be purged or isolated at end.Silent; downstream model reuse implied by “service improvement” language.

Sample AI Clauses You Can Adapt

These are deliberately generic; you’d tune them for one-way vs mutual NDAs, roles, and sector.

AI Tool Use – Guardrails Version

Use of Artificial Intelligence Tools.
Recipient shall not input or upload any Confidential Information into any publicly-available or consumer-grade generative artificial intelligence, large language model, or similar machine learning system. Recipient may use enterprise AI tools and services in connection with the Permitted Purpose only where: (a) such tools are provided under written agreements that treat Discloser’s Confidential Information as confidential, (b) such agreements expressly prohibit the provider from using Discloser’s Confidential Information (including prompts, files, and outputs) to train or improve models made available to other customers, and (c) Discloser has been informed of such provider upon request.

No Training On Discloser’s Data

No Model Training.
Recipient shall not use Confidential Information, nor permit Confidential Information to be used, to train, fine-tune, or otherwise improve any machine learning or artificial intelligence model for the benefit of Recipient or any third party, except to the limited extent necessary to operate models dedicated solely to providing the Permitted Purpose to Discloser within Recipient’s environment. Under no circumstances shall any model trained or fine-tuned on Confidential Information be used to generate outputs for any person other than Discloser.

Derived Data, Embeddings, Logs

Derived Data and Technical Artifacts.
For clarity, “Confidential Information” includes any derivative data, embeddings, indexes, logs, and model parameters generated by or on behalf of Recipient that encode or are reasonably capable of revealing the substance of such Confidential Information. Recipient shall not retain such artifacts beyond the term of this Agreement except as required by law or as aggregated and irreversibly de-identified so that no individual Confidential Information of Discloser can be reconstructed.

Residual Knowledge Carve-Out (Modernized)

Residual Knowledge.
Notwithstanding the foregoing, nothing in this Agreement shall prevent Recipient’s personnel from using unaided memories of Confidential Information retained in the ordinary course of work, provided that such personnel do not intentionally memorize Confidential Information for the purpose of avoiding the restrictions of this Agreement. This residual knowledge carve-out does not authorize: (a) the use of any machine learning or artificial intelligence system trained on Confidential Information beyond the Permitted Purpose, or (b) the use or disclosure of Discloser’s trade secrets in a manner that would misappropriate such trade secrets.


Quick Matrices You Can Reuse In NDAs Or Policies

Matrix: What Your Internal Policy Should Say About NDAs + AI

RoleAllowed With NDA-Covered DataNot Allowed
Sales / BDUse approved enterprise AI tool fed via secured CRM, if vendor bound by NDA/DPA and no-training clause.Pasting prospect NDAs, pricing sheets, or customer names into public chatbots for email drafting.
EngineeringUse enterprise coding assistant approved by security; repositories under NDA hosted on company-controlled systems.Feeding third-party source or confidential specs from NDA into personal GitHub Copilot/consumer tools.
Legal / OpsUse internal AI tools hosted within firm or vendor’s enterprise tenant with explicit confidentiality commitments.Running counterparty’s NDA or term sheet through a free AI website with unknown logging/training.

Matrix: Which NDA Flavor To Use

ScenarioRecommended NDA Variant
Pre-acquisition diligence (private company, sensitive tech)AI-Prohibited NDA – or AI allowed only via explicitly listed tools under strict no-training terms.
Evaluating SaaS vendor, normal B2B deal flowAI-Guardrails NDA – allow AI, but ban cross-tenant training and require enterprise-grade LLM endpoints.
Joint AI product development, shared datasetsAI-Collaboration NDA plus a separate Data / Model Use Addendum spelling out rights to training data and models.
Simple low-stakes vendor quote, no confidential techClassic NDA may be tolerable, but it’s often easier to use your AI-Guardrails template anyway for consistency.

How To Upgrade Your Existing NDA Template

If you already have a house NDA, you don’t have to rewrite from scratch. A minimal upgrade path:

  1. Add AI tool and model-training clauses (like the ones above).
  2. Clarify derived artifacts (embeddings, logs, weights) in the definition of Confidential Information.
  3. Tighten residuals to make clear they don’t extend to models.
  4. Add a subprocessors / third-party providers clause that explicitly covers AI vendors.
  5. Optionally, create two versions:
    • a “red” version (AI-Prohibited) for critical matters;
    • a “blue” version (AI-Guardrails) for ordinary commercial use.

Once you’ve done that, you’ve basically dragged your NDA library into the AI era:

  • Counterparties know what they can and can’t paste into tools.
  • AI vendors are forced into enterprise-grade, no-training lanes.
  • And your clients’ secrets are much less likely to end up as part of someone else’s model weights.

More from Terms.Law