What's wrong with this picture? AI-created images and content: copyright ownership and infringement
Countless social media posts, images and articles have been generated using AI tools such as ChatGPT, Stable Diffusion and Dall-E. But who – if anyone – owns the copyright in the outputs, and do the tools themselves infringe copyright? Our IP & Tech//Digital teams take a deep dive into the issues and the recent infringement claims.
Prompt-based AI tools raise two key questions under UK copyright law:
1. Are the outputs from AI tools protected by copyright at all and, if so, who owns those outputs? This determines whether a user can prevent others from copying their AI-created content.
2. Do the tools themselves infringe copyright in the underlying materials used to train the tools, and can the owners of copyright in those materials prevent the use and exploitation of (i) the AI tools themselves and (ii) the outputs of AI tools?
Who owns the outputs?
Under UK copyright law, if copyright subsists at all (see below) the user probably owns the output generated by prompt-based AI tools such as ChatGPT or Stable Diffusion.
Section 9(3) of the Copyright Designs and Patents Act 1988 (CDPA) provides that the author of a computer-generated work is "the person by whom the arrangements necessary for the creation of the work are undertaken". 'Computer-generated' works are defined as those "generated by computer in circumstances such that there is no human author of the work".
Although as yet untested in court, it is likely that entering a prompt to cause ChatGPT to generate an output constitutes the "undertaking" of "arrangements necessary for the creation of the work". In that case, the user of the tool – and not the producer of the tool –- would be the first owner. Although the only judgment on section .9(3) CDPA to date held that image frames generated in the course of playing a video game belonged to game's publisher rather than the game's player, the courts are likely to see the position of the user of an AI tool as fundamentally different to that of the player of a video game.
Any remaining doubt about ownership as between the user and the creator of the tool can be resolved by contract. For example, the user terms of ChatGPT, assign copyright in the outputs to the user, although the Stable Diffusion licence is silent regarding the ownership of outputs.
Are the outputs protected by copyright at all?
If copyright does not subsist in the first place, questions about copyright ownership become academic. In particular, a copyright work must be sufficiently original to qualify for copyright protection.
The English courts have for several decades applied a low threshold for originality, requiring the application of a limited degree of "skill, judgment and labour" in producing a work, which cannot itself have been copied from another work. This was the relevant originality standard at the time section 9(3) CDPA (see above) was enacted. It is assumed that the UK legislature considered that works without a human author could meet this originality threshold, as section 9(3) would otherwise have been pointless. Applying UK-only copyright legislation, AI-produced works probably would be protectable via copyright.
However, EU law has developed a different standard of originality. EU Directives on software and databases introduced the requirement that a work must be the "author's own intellectual creation" before copyright can subsist. The Court of Justice has since applied this standard more broadly to encompass copyright works beyond software and databases (see for example the Painer and Cofemel judgments).
The "author's own intellectual creation" is generally regarded as requiring a higher standard of originality than the English case law standard. Many commentators consider that AI-created works that do not have a human author cannot meet this higher standard. This would mean that AI-generated works do not attract copyright protection under EU law, although AI-assisted works may still be protected. Even prior to Brexit, there had been debate as to how widely the EU originality standard should apply to different categories of copyright works under UK law, and how different the EU standard actually is in practice (see paragraphs 19-20 of NLA v Meltwater versus paragraphs 29-37 of SAS Institute v World Programming).
For now, there is scope for argument whether the EU standard or the UK standard of originality should apply to literary and artistic works produced by AI systems such as ChatGPT, Stable Diffusion or DALL-E. This is an area where UK law may diverge from EU law following Brexit if the higher courts are given the opportunity to consider the issue, as the UK Supreme Court has been regarding patents for AI-generated inventions.
This is not a uniquely European problem. The US Copyright Office has consistently refused to register copyright works without a human author (see Thaler's application and the recent refusal to register an image created using the Midjourney platform), and has now issued guidance on works containing material generated by AI.
Do AI tools infringe copyright in the underlying training materials?
Regardless of whether users acquire any copyright of their own in the outputs of AI tools, the providers of the tools – and in some cases the users of the tools – could be liable for infringing third parties' copyright in the underlying training materials depending on the specific implementation of the AI model and training process.
On 16 January 2023, stock-image supplier Getty Images filed proceedings against Stability AI Ltd, provider of popular AI image-generator Stable Diffusion, in the English High Court. At the time of writing, it is understood that the English claim has not yet been formally served on Stability AI Ltd, and the Particulars of Claim are not yet publicly available.
However, on 3 February 2023, Getty Images, Inc. also brought a claim in the Delaware courts against Stability AI, Inc, the Delaware-incorporated parent of Stability AI Ltd. The Complaint in the Delaware proceedings is available online.
Stable Diffusion takes text prompts and generates detailed images which can be indistinguishable from genuine photographs or artworks to the untrained eye (although Getty Images' Delaware claim is partly directed at brand damage caused by allegedly low- quality images generated by Stable Diffusion).
At the time of writing, Stability AI is yet to file a Defence, and it is currently unclear which of the factual allegations will be admitted or denied. Without wishing to state any opinion on the merits of any specific allegations, the Delaware Complaint raises the following points that are likely to be common to other AI-related disputes:
- Databases of stock images can be valuable sources of training data, as images are often accompanied by detailed textual labelling of the image content on which to train AI models, saving the cost of human labelling of data.
- Infringement risks can be mitigated or avoided altogether by obtaining a licence from the copyright and database owner(s), both for the training process and the use of the AI tool.
- Where an AI model is trained on an open- source dataset, this makes it easier for copyright owners to identify which of their works have been used, providing more certainty at the outset of an infringement claim.
- The processes of: (i) compiling the training dataset/corpus; (ii) training the AI model; and (iii) generating new images from the model each raise different copyright infringement considerations. These processes might each be undertaken by distinct, unconnected legal entities, some of whom may have significant liability for copyright infringement, while others who have contributed to the same product might have no liability at all.
- When assessing infringement liability, attention must be paid to the specific acts that each entity undertakes, the geographical location of those acts, and which territory's laws apply accordingly. For example:
- The creators of a database that consists only of weblinks to online copies of images for use in the training process might not themselves have copied any individual copyright works or made those works available to the public, thereby avoiding any liability for infringement;
- Whether the process of training an AI model (as distinct from compiling a database for use in training) infringes copyright will vary between jurisdictions. In Europe, this may depend on whether copies of the images are made at all during training and, if copies are made, whether those copies are temporary and transient, automatically deleted via a technical process, or stored permanently. Some countries provide copyright liability exceptions for 'data mining', but others do not. The question of liability may turn on where the training process took place; and
- Even if the training process involved infringements of copyright, the use of the AI tool may not itself infringe copyright. If the model stores and utilises copies of training images in generating new images, the infringement risk will be higher than for models which store only abstracted mathematical parameters and datapoints.
The answers to these questions will depend on the technical details of each individual product and training process.
- An AI-generated image output from the model (as opposed to the model itself) may not reproduce any part of any single image from the training set, let alone a substantial part, which is required for infringement under UK law. However, where an output image is derived from a small dataset or bears a close resemblance to a small number of identifiable images within the training set, a finding of infringement is far more likely.
- The scope of 'fair use' defences varies significantly between jurisdictions. In general, where the AI tool is used commercially, or where it competes with or affects the economic interests of the copyright owner, the chances of falling within an applicable fair use defence are reduced.
- Damages for copyright infringement vary dramatically between jurisdictions. In the US, for example, statutory damages of between $750 and $30,000 per infringement may be awarded, with the upper limit increased to $150,000 in the case of wilful infringements. If copyright works have been copied in their millions during a training process, the damages exposure can escalate very quickly. In the UK, monetary damages would be lower but injunctions are typically easier to obtain than in the US.
Copyright infringement is not the only potential claim:
- Training databases compiled via web-scraping could attract liability for breach of website terms and conditions, database right infringements, and tortious interference;
- Many countries have laws ancillary to copyright regarding the unauthorised removal or application of copyright protection information or watermarks, which an AI model may do automatically as part of training and image generation; and
- Getty Images' Delaware Complaint cites examples of images generated by Stable Diffusion that bear a Getty Images watermark, probably as a result of the high number of Getty Images photographs within the training set. The watermark itself comprises registered trade marks, and its application by Stable Diffusion's AI is alleged to constitute trade mark infringement.
A flurry of litigation
We have previously written about claims against GitHub's AI-assisted Co-Pilot product. A separate claim was filed against Stability AI, Midjourney and DeviantArt in California in January 2023. As tools based on unlicensed training content are adopted ever more widely, further litigation seems inevitable.
Plans for UK reform?
The UK Intellectual Property Office (IPO) ran a public consultation on Artificial Intelligence and IP from October 2021 to January 2022. The Government response in June 2022 stated simply that "computer-generated works are currently protected in UK law", and did not directly address the obstacles posed by EU law. It decided not to make any changes to the existing law on the subsistence or ownership of copyright in computer-generated works.
That same Government response had recommended introducing a new exception from liability for copyright infringement for 'text and data mining' to train AI systems. In early February 2023 the UK Government confirmed that it no longer intended to introduce that exception, which had been strongly opposed by copyright owners. However, the story does not end there.
On 15 March 2023 the report of Sir Patrick Vallance on the Pro-innovation Regulation of Technologies Review was published, stating as follows:
"If the government’s aim is to promote an innovative AI industry in the UK, it should enable mining of available data, text, and images (the input) and utilise existing protections of copyright and IP law on the output of AI. There is an urgent need to prioritise practical solutions to the barriers faced by AI firms in accessing copyright and database materials. The government should work with the AI and creative industries to develop ways to enable TDM for any purpose, and to include the use of publicly available content including that covered by intellectual property as an input to TDM (including databases). The opportunity here is to focus on clarifying a simple process concerning the input to AI models; IP rights and their enforcement would apply to the output of any product. We also recommend a code of practice and a requirement for altered images to be labelled as generated or assisted by AI."
The Government's response states that, to provide clarity, the UK Intellectual Property Office will produce a code of practice by summer 2023, and that "an AI firm which commits to the code of practice can expect to be able to have a reasonable licence offered by a rights holder in return". (See recommendation 2 of the response) Legislation is envisaged only where the code of practice is not adopted.
The UK Chancellor also referred to Sir Patrick's report in his Budget speech on 15 March, stating that the Government would "work at pace with the Intellectual Property Office to get clarity on IP rules so that generative AI companies can access the material they need".