AI and data licensing
I am Sergei Tokmakov, a California attorney (CA Bar #279869). AI contract work has become a meaningful share of my practice over the last two years. This page covers training-data licenses, output rights, model service agreements, AI-vendor contracts, and the privacy-law overlay that surrounds them, particularly CCPA/CPRA and GDPR for cross-border data. It is written for founders, data providers, model builders, and operators who are buying or selling AI services and need to know what the paper actually says about training, output ownership, and downstream risk.
Matters I handle in this area
- Training-data license agreements. Drafting and reviewing licenses for image, text, audio, code, and proprietary dataset use in model training. Scope of use, sublicensing, derivative dataset rights, and audit clauses.
- Output rights and IP ownership. Who owns the model output, what license back to the platform survives, what carve-outs exist for prompts and fine-tuned models, and how the contract handles output that mirrors training data.
- Model service agreements (MSAs) with AI vendors. Customer-side review of OpenAI, Anthropic, Cohere, Google, AWS Bedrock, and Azure OpenAI terms; vendor-side drafting for AI products built on those APIs.
- AI service provider contracts under CCPA/CPRA. Service-provider versus contractor versus third-party classification, which one your contract actually qualifies as, and what restrictions on use this places on the AI vendor.
- GDPR cross-border AI deployment. Article 28 processor terms, international data transfer mechanisms (SCCs, UK IDTA), and the EU AI Act overlay for general-purpose AI providers and deployers.
- AI-vendor indemnification fights. Negotiating IP indemnity scope (training versus output), exclusions for customer-supplied prompts, and the recent industry move toward output-side IP indemnity.
- AI product TOS and usage policies. Drafting customer-facing terms for AI products, including prohibited-use lists, acceptable-use enforcement, output disclaimers, and dispute clauses.
Why this is a separate practice area
AI contracts look like SaaS contracts on the surface and they are not. The substantive risk allocation runs through clauses that did not exist five years ago: training-data warranties, model-output IP, prompt confidentiality, abuse-monitoring access, and the chain of CCPA service-provider and GDPR processor language that has to flow down through model, API, and integrator. I treat these as a separate practice because the diligence checklist is materially different.
Anonymized case studies
Dataset provider negotiated a license to a foundation-model builder
Facts: A specialty data curator owned a roughly half-million-image dataset cleared for commercial use. A foundation-model builder wanted a license for training a multi-modal model. The model builder's standard agreement included broad sublicense rights, perpetual use after termination, and no audit rights for the data owner.
What I did: I rewrote the license as field-of-use limited (model training only), prohibited sublicensing to third-party model trainers without consent, capped the perpetual-use carve-out to a defined named-model list, and added an audit right with a confidentiality wrapper. I added a representation that the trained model would not be intentionally configured to regenerate identifiable training images at output time.
Outcome: The model builder accepted the field-of-use limit and the named-model list. The audit right was narrowed to a third-party auditor under a mutual NDA. The license fee was paid up front rather than in installments.
SaaS customer adopting an AI feature built on a third-party foundation model
Facts: A mid-market SaaS company wanted to add an AI summarization feature for its enterprise customers. The feature was built on a major foundation-model API. The customer-side enterprise agreements promised "no use of customer data for model training" but the SaaS company's vendor agreement with the foundation-model provider did not back this promise cleanly.
What I did: I reviewed the customer-facing enterprise terms and the foundation-model API agreement side by side. I identified the gap: the foundation-model agreement allowed limited training use unless the customer was on a specific enterprise tier and certain settings were enabled. I drafted a corrective DPA addendum, a one-page internal control checklist, and revised customer-facing language that was accurate to the actual technical configuration.
Outcome: The customer-facing terms were updated to match the underlying configuration. The SaaS company moved to the enterprise tier of the foundation-model API and confirmed the no-training setting was enabled by default. The exposure window was disclosed to the largest affected customer with a written attestation.
AI vendor sued by a customer over alleged output IP infringement
Facts: A code-generation AI vendor received a demand letter from a customer whose downstream client had been accused of using AI-generated code that allegedly mirrored an open-source codebase under a copyleft license. The customer demanded indemnification under the vendor's standard agreement, which included an IP indemnity but excluded "customer prompts" from coverage.
What I did: I represented the AI vendor. I reviewed the prompt history, the model's output filters, and the indemnity clause's actual scope. The output in question was generated from a customer-supplied prompt that explicitly requested code matching a specific open-source style. Under the contract, that prompt-driven scenario fell inside the prompt exclusion. I drafted a response letter explaining the analysis and offering, without admission, a structured cooperation package, including indemnity for legal fees in the downstream matter up to a defined cap.
Outcome: The customer accepted the cooperation package. The downstream matter was resolved at the customer's level. The AI vendor revised its onboarding flow to include a prompt-warning prior to generating code in styles closely associated with copyleft codebases.
Controlling California statutes and federal authority
Below is the working list of authority I most often invoke. AI law is moving; I confirm citations against the current statutory text before they enter a client deliverable.
- Cal. Civ. Code section 1798.100 et seq. (CCPA, as amended by CPRA), including the service-provider and contractor definitions and required contract terms.
- California Privacy Protection Agency regulations, including the 2025-2026 rules on automated decisionmaking technology and risk assessments.
- Cal. Civ. Code section 3344, right of publicity, when AI output uses a name, image, or voice without consent.
- Cal. Bus. and Prof. Code section 17200, Unfair Competition Law, applied to deceptive AI marketing and undisclosed data use.
- Cal. Bus. and Prof. Code section 22675 et seq., where applicable to AI training data transparency obligations enacted in 2024 (e.g., AB 2013) for generative AI developers operating in California.
- Cal. Civ. Code section 1798.99.20 et seq., automated decisionmaking and adverse-action notice obligations in specific industries.
- 17 U.S.C. section 102 et seq. (federal Copyright Act), including the line of cases on AI-generated output and human authorship under Thaler v. Perlmutter and Copyright Office guidance from 2023-2025.
- 17 U.S.C. section 1201 (DMCA), including the anti-circumvention exemptions for AI security research.
- Federal Defend Trade Secrets Act, 18 U.S.C. section 1836, for training-data misappropriation.
- EU Artificial Intelligence Act (Regulation (EU) 2024/1689), including general-purpose AI obligations and the prohibited-practice list.
- GDPR Articles 5, 6, 22, 28, 32, and 46, on lawfulness, automated decisions, processor terms, security, and international transfer.
- UK Data Protection Act 2018 and ICO automated-decision guidance, when the customer or data subjects are in the UK.
- Case law: Andersen v. Stability AI (N.D. Cal., pending) and the consolidated training-data cases; Authors Guild v. OpenAI (S.D.N.Y., pending); New York Times v. Microsoft and OpenAI (S.D.N.Y., pending). These are moving targets; I cite them as pending where they remain so.
Sample contract issues I check on every AI review
- Training-data scope: what data the vendor uses, whether customer inputs are used for training, and what the opt-out mechanism is.
- Output ownership: who owns the output, what license back to the vendor survives, what carve-outs apply to fine-tunes and custom models.
- IP indemnity: is the indemnity output-side, training-side, or both, and what are the exclusions for customer prompts and customer-supplied data.
- Service-provider classification under CCPA/CPRA: is the contract drafted to qualify, and what restrictions on use does that impose.
- GDPR Article 28 terms: are processor obligations, sub-processor flow-down, and breach windows present and accurate.
- Data residency: where is training and inference performed, and what international transfer mechanism applies.
- Abuse-monitoring access: does the vendor have a contractual right to read prompts and outputs for safety review, and how is that reconciled with customer confidentiality.
- Termination and deletion: does data and any fine-tuned weights survive termination, and on what schedule.
- Acceptable use: are the prohibited-use categories clear, are they reciprocal, and what is the cure-and-suspension mechanism.
- EU AI Act overlay: is the vendor a general-purpose AI provider, and are the technical-documentation and disclosure obligations addressed.
Where the privacy-law overlay actually bites
The privacy overlay on AI contracts is more than a check-the-box exercise. Three concrete examples where the privacy posture defines the deal.
CCPA service-provider classification. A SaaS company embedding a third-party AI vendor in its product almost always wants the AI vendor to qualify as a "service provider" under the CCPA. The qualification is contractual: the vendor cannot use the data for its own purposes, cannot retain it beyond the contract period, and is restricted in cross-context behavioral use. If the AI vendor's contract reserves training-use rights, the vendor may not be a service provider, which means the SaaS company's "we do not sell or share your data" statement to its own customers becomes inaccurate. The exposure is on the SaaS company, not the AI vendor. I review the contract for this gap on every AI vendor review.
GDPR international transfer mechanism. When the AI vendor processes EU personal data, the contract has to designate a transfer mechanism: Standard Contractual Clauses, the EU-US Data Privacy Framework (where the vendor is self-certified), or in narrow cases an adequacy decision. Many AI vendor contracts gesture at this without naming the mechanism. The contract has to actually name it. The customer's own DPIA depends on the answer.
Automated decisionmaking under CCPA/CPRA and the EU AI Act. California's automated decisionmaking technology rules and the EU AI Act both impose disclosure and risk-assessment obligations on certain uses of AI. The contract should disclose whether the vendor is providing a "high-risk" AI system (EU AI Act) or a "covered" automated decisionmaking technology (California CPPA rules), so the customer can fulfill its own downstream obligations. A vendor that disclaims this disclosure in the contract is offloading regulatory risk onto the customer; sometimes that is acceptable, sometimes it is not. I flag it explicitly in every AI vendor review.
The single most important question for any AI deal
One question separates a defensible AI vendor relationship from a problem waiting to happen: does the customer's data become part of the vendor's model. The contract has to answer this clearly. "We do not use customer data for training" is not enough; the contract needs to say so, the technical configuration has to support the contract, and the carve-outs (abuse monitoring, safety review, aggregated and de-identified data) have to be defined narrowly enough that they do not swallow the rule. The recurring failure mode is a contract that says training-use is opt-out by default, when the customer's own customer-facing terms promise that no customer data is used for training. The two documents diverge silently until something else triggers a review. Catching this on day one is most of the value I provide on AI vendor work.
Typical fee ranges
Frequent questions on AI and data licensing
Are AI outputs copyrightable? Under current US Copyright Office guidance, purely AI-generated output without meaningful human authorship is not copyrightable. Human-authored prompts and human-curated selection or arrangement of AI output may produce a protectable work. The line is fact-specific and moving. I track the Copyright Office guidance and the pending litigation; for any specific product I will tell the client where the asset sits on that line and what the documentation strategy should be.
Will US lawsuits over training data invalidate my product? Almost certainly not for users of foundation-model APIs. Output-side IP indemnity is now standard from the major foundation-model vendors for enterprise customers; that indemnity is the practical risk transfer mechanism. For model builders training on third-party data, the risk is real and is the subject of pending cases. I read the pending docket monthly and adjust client advice accordingly.
How do I draft customer-facing terms for an AI product? Output disclaimer, accuracy and reliability disclaimer, prohibited-use list, customer data and training opt-out, and a clear allocation of who owns the prompt and the output. I draft these as one document, not as separate boilerplate clauses, so the customer can read it in one sitting and understand what they are agreeing to.
Do I need a DPA with my foundation-model vendor? If you process personal data (most customer-facing AI features do), yes. The major foundation-model vendors offer DPAs; many are linked from the vendor's trust center and require manual countersignature. A surprising number of customers run AI features for months without ever executing the DPA.
What about the EU AI Act? If you serve EU users, the EU AI Act overlay applies. General-purpose AI obligations (transparency, technical documentation, copyright training-data summary) sit on the model provider; deployer obligations sit on you. The implementation timeline runs through 2026 to 2027; the practical advice is to map your product against the risk categories now, not later.
When to engage me, when to handle it internally, when to go to a large firm
Engage me when you are signing an AI vendor contract above the standard self-serve tier, when you are licensing a dataset or being licensed one, when you are launching an AI product and need a customer-facing terms package that does not break your CCPA or GDPR position, or when you have a single-counterparty dispute over output IP or training-data use. I am the right fit for founders, in-house counsel, and operators who want a working redline and a one-paragraph clear answer.
Handle it internally when you are buying AI services at a self-serve tier with standard terms and your use case is low-risk (internal productivity, not customer-facing). The vendor's standard agreement is rarely worth a redline at that level. Confirm your team is not pasting customer data into prompts and you have done what you needed to do.
Go to a large firm when you are litigating a class action over training data, when you are responding to a regulator under the EU AI Act, the FTC, or the CPPA enforcement teams, or when you are negotiating a foundation-model partnership at the level that triggers an antitrust review. Cooley, Wilson Sonsini, Latham, and Gunderson have full benches for AI; for a major litigation or a multi-jurisdictional regulatory matter, hire them and use me for a second-view read on specific clauses if you want.
Send the AI agreement or the matter summary
Email me with the agreement attached and a few lines on your role. I respond personally, usually within one business day.
What to include: the agreement file or product TOS link, whether you are vendor, customer, or licensor, the deal value or risk amount, your jurisdictions (US states, EU, UK), and one paragraph on what you want changed or recovered.
Email the AI-licensing intake