Significant development this week that everyone in this thread should know about: a key ruling in the consolidated AI training data cases held that scraping copyrighted content for AI training is not automatically shielded by fair use, and critically, the court rejected the argument that robots.txt non-compliance is irrelevant to the copyright analysis.
The ruling specifically stated that a content owner's technical measures to prevent scraping (robots.txt, terms of service prohibitions, crawler blocking) are relevant to the fair use analysis because they demonstrate the copyright holder's intent to restrict the use. This is a significant shift from earlier rulings that treated robots.txt as legally meaningless. The court analogized it to "No Trespassing" signs -- they do not create the property right, but they are evidence that the owner did not consent to the use.
For practical purposes, this means the steps many of us took early on -- blocking AI crawlers, updating terms of service, sending cease-and-desist letters -- are now directly relevant to strengthening legal claims. If you have not already documented when you implemented these technical measures, do so now. Timestamps matter.
The ruling also opens the door to state-law trespass to chattels claims in addition to federal copyright claims, because unauthorized scraping that circumvents technical barriers may constitute interference with computer systems. Several state attorneys general, including California and New York, have signaled interest in pursuing enforcement actions against AI companies that systematically ignored robots.txt exclusions.