Skip to main content

Clifford Chance

Clifford Chance
Artificial intelligence<br />

Artificial intelligence

Talking Tech

The Boundaries of Playing "Fair" When Training AI

Artificial Intelligence Intellectual Property 25 March 2025

Can the fair use doctrine be a defense for training artificial intelligence (AI) on copyrighted materials? This is one of the key questions in the developing AI legal landscape and one highlighted recently in OpenAI's and Google's comments to President Trump's U.S. AI Action Plan, where the tech giants argued that freedom to innovate is, among other things, a matter of national security. More than 400 entertainment industry members responded, urging the Trump administration not to jeopardize copyright protections for America's creative and knowledge industries to train AI.

The answer to the question is not simple and the applicability of the fair use defense may vary depending on AI system type.  In  Thomson Reuters Enterprise Centre GmbH and West Publishing Corp. v. Ross Intelligence Inc., No. 1:20-cv-613-SB (D. Del.) (Thomson), a recent case involving a tool that uses natural language processing techniques, the court determined that fair use was not an effective defense. Although this decision was fact-specific, it, along with similar cases, is beginning to shape this area of AI law, with potential larger and significant implications for society.

The Basics of the Fair Use Doctrine

The fair use defense is a legal doctrine favored by defendants in many copyright cases, including recent cases involving generative AI. This affirmative defense aims to balance the interests of copyright holders with the public interest in the broader distribution and utilization of creative works by permitting limited use of copyrighted material without permission from the copyright owner. Section 107 of the U.S. Copyright Act of 1976 sets out a list of examples of fair use, including "use by reproduction in copies or phonorecords" for various purposes, including criticism, comment and research.  The section then outlines factors to consider:

1.     Purpose and Character of the Use. This factor considers purpose and use, including whether the use is for commercial or nonprofit educational purposes. Uses that are transformative (i.e., adding new expression or meaning to the original work), are more likely to be considered fair use.

2.     Nature of the Copyrighted Work. This factor looks at whether the work is more factual than creative. Uses of factual works are more likely to be considered fair use than uses of creative works, such as novels or movies.

3.     Amount and Substantiality of the Portion Used. This factor examines the quantity and significance of the used portion in relation to the copyrighted work as a whole. Use of smaller, less significant portions is more likely to be considered fair use.

4.     Effect of the Use on the Market for the Original Work. This factor assesses whether the use negatively impacts the market or potential market for the original work. If the use could replace the original work and reduce its sales, it is less likely to be considered fair use.

Courts consider all four factors collectively, and no single factor is determinative for whether a defendant can successfully leverage fair use. The factors are also not exhaustive, and a specific fair use analysis may consider other relevant parameters. 

Recent Litigation Shapes the Boundaries of the Fair Use Doctrine

Although the boundaries of the fair use doctrine are defined in certain areas, such as e.g., music and film (where the used content must not be the "heart" of the work and must not negatively affect the market value of the original work),  they are in flux in the context of AI and generative AI.  The majority of these cases are still in the pleadings and discovery stage and the doctrine's use will likely remain largely untested for some time.

A recent well-known case, The New York Times Company v. Microsoft Corporation et al., No. 1:23-cv-11195  (S.D.N.Y.),  involves a group of news organizations, led by The New York Times, suing OpenAI and Microsoft Corporation.  The plaintiffs claim that the defendants' generative AI tools, including ChatGPT and Copilot, use millions of their copyrighted works without permission or payment. The arguments extend to questions of societal good as OpenAI's and Microsoft's practices allegedly undermine high-quality independent journalism. Although the parties initially aimed to negotiate an appropriate license for this type of use, the negotiations broke down. The defendants argued fair use, and specifically, that the AI tools' output was "transformative" under the first factor of the doctrine. "In this case The New York Times uses its might and its megaphone to challenge the latest profound technological advance…[but] copyright law is no more an obstacle to the LLM than it was to the VCR (or the player piano, copy machine, personal computer, internet or search engine)", wrote Microsoft's lawyers in a court filing.  The current deadline for amending the complaint is set for April 15, 2025, and fact discovery is set to be completed by April 30, 2025—barring any additional extensions granted by the court.  

The argument that output from models trained on copyrighted material is transformative and also, does not harm the market value of the original works appears in other ongoing litigation, including Basbanes, et al. v. Microsoft Corporation, et. al., No. 1:24-cv-00084 (S.D.N.Y) (Basbanes) and Andersen, et. al. v. Stability AI Ltd., et. al., No. 3:23-cv-00201 (N.D. Cal.) (Andersen).  In Basbanes, two well-known nonfiction authors, Nicholas Basbanes and Nicholas Gage, filed a class action lawsuit against OpenAI and Microsoft Corporation, accusing the defendants of using the authors' copyrighted works to develop their AI systems.  The authors purport to represent other authors whose works are "systematically pilfered" by Microsoft and OpenAI—"[t]hey're no different than any other thief", the lawsuit states.  The defendants in Basbanes asserted the fair use defense.  This case was consolidated with two other cases, Authors Guild, et al. v. Open AI Inc., et al., No. 1:23-cv-08292 and Jonathan Alter et al., v. Open AI Inc., et al., No. 1:23-cv-10211, which made similar assertions.  The consolidated class action is currently in the discovery phase.

In Andersen, a group of artists filed a putative class action against Stability AI and certain other AI developers alleging copyright infringement, challenging their art's use as training materials for an AI platform, Stable Diffusion. The defendants argue that such use should be deemed fair use as a matter of law, given the large size of the training datasets and plaintiffs' inability to use Stable Diffusion to reproduce works that are substantially similar to the copyrighted materials.  The case is currently in the discovery phase.

In Thomson, the defendant who used copyrighted works to train AI, also relied on the transformative use argument as well as several others under the fair use doctrine. Ross Intelligence (Ross), an AI startup, wished to train its AI platform with resources of Thomson Reuters, the owner of an extensive database of U.S. judicial decisions, including Westlaw headnotes.  Ross initially sought a license for this training, but subsequently, engaged a third party, LegalEase Solutions, to produce "Bulk Memos", which were later found by the court to closely resemble Thomson Reuter's copyrighted materials. Thomson Reuters sued Ross for copyright infringement, and Ross claimed fair use as a defense, contending that the AI's transformative use justified Ross' actions.

The court determined that the Westlaw headnotes were sufficiently original to qualify for copyright protection and that Ross' use constituted direct copying and was not transformative. The court also found that Ross' use primarily served a commercial purpose. It considered the Westlaw headnotes not to be creative enough to satisfy the second prong of the fair use analysis, but this factor held less influence in the court's decision. Although Ross did not publicly distribute copied materials, the court emphasized the "market effect" factor, noting Ross' intent to develop a competing product. As a result, the fair use defense was not applicable in this case.

While the court clarified that Ross' AI system does not qualify as generative AI, the court's opinion may influence how fair use is evaluated in generative AI related litigation, especially with respect to the interpretation of the fourth prong regarding market impact and licensing potential.  On a larger scale, it will be interesting to see whether and how copyright law meets the challenges posed by AI and particularly generative AI.

Drafting Your Playbook for Training AI

As AI and generative AI technologies evolve and as litigation progresses, practical guidance for accessing each of the four fair use factors will be key in enabling innovation and avoiding or defending against litigation.  Relevant considerations include:

1.     Purpose and Character of the Use. When reviewing this factor, it is essential to demonstrate that the use is transformative, adding new expression, meaning, or message to the original work. For example, if the AI model generates new insights or analyses instead of merely replicating the original content, this could support a fair use claim. The commercial nature of the use might also be scrutinized more heavily, particularly if the AI system competes directly with the original content provider.  Non-commercial uses are more likely to be considered fair use.

2.     Nature of the Copyrighted Work. Courts favor fair use when the work is factual rather than highly creative, and published works are more likely to be subject to fair use than unpublished ones. For instance, text-based content may often contain both factual and creative elements, and a court might weigh the creative aspects more heavily, making it harder to successfully claim fair use. Because text-based content may be published, this factor might still favor fair use, but the creative nature of the work could counterbalance this.

3.     Amount and Substantiality of the Portion Used. Using only the amount of the original work necessary for the purpose, avoiding substantial portions or key excerpts of such work, increase the likelihood of success of a fair use defense. A court might look unfavorably on AI systems where the output produces the "heart" of a work on which the AI was trained.

4.     Effect of the Use on the Market for the Original Work. It may be more challenging to demonstrate no market harm if an AI system provides similar content to what is offered by the original work. However, if an AI system serves a different market or purpose and/or complements rather than substitutes the original content, it might be more likely to be deemed fair use.  Because generative AI models create new content across different mediums (e.g., data, text, music), the argument for the output being transformative and having no market effect for the original work may be stronger.

 

The rapid pace of AI development and attendant intellectual property and other considerations will continue to warrant careful attention. Our team is tracking these developments and would be happy to discuss our findings and recommendations.