Copyright and AI: Part 2 – Infringement by machine?
Earlier this year, we wrote the first part of our series looking at copyright and generative AI, Copyright and AI: Part 1 – Teaching the machine. As we discussed then, so that an AI tool (like ChatGPT or Stable Diffusion) can follow an instruction to do something, like write something or draw something, it needs to have been provided and taught with information that it can later use and draw on.
We focused on possible copyright infringing acts at the “input” stage if an AI is taught with unlicensed data that has perhaps been scraped from the World Wide Web. In addition, we looked at the possible exceptions to infringing acts that may be available in the UK now, but also the possible future expansion of the text and data mining (TDM) exception given that providing an “AI-friendly” regulatory environment while protecting the rights holders is a relatively recent but pressing demand facing most governments.
In this second article, we are looking at possible copyright infringing acts that occur at the “output” stage, infringements, we might colloquially say that are “by machine”, when a user uses generative AI to create a new work. Then we touch on what the UK Government’s next move may be, following the latest report and recommendations on the topic by the Culture, Media and Sport Committee1.
The situation we have in mind when considering output infringement is one where an AI tool has been used to create something, like an image or some text, under instruction by the user.
Now, suppose that the output work resembles in some way a copyright work owned by a third party. What are the options for that third party?
(It is worth us pointing out that we are not addressing here the question of whether the output work is itself a copyright work. The truth is that, under the UK law anyway, it might be or it might not be. That is a separate and certainly interesting question – perhaps worth a separate article in due course. Our focus here though is on the possible infringement of third party works.)
Rights of owner of copyright work
In the UK, the owner of a copyright work, that is an original literary, dramatic, musical or artistic work, has various exclusive rights in relation to that work including but not limited to the rights to copy it, issue copies of it to the public, communicate it to the public and adapt it. If others do these acts or authorise them to be done, without the consent of the copyright owner, they may be infringing copyright if they cannot rely on one of the statutory exceptions. This means that, in some circumstances, infringement can occur as soon as the work is created.
For the purpose of this article, we will focus mainly on infringement by copying. In the context of outputs from generative AI tools, in the UK, an output work created using AI may infringe copyright if it copies or if, in other words, it reproduces another copyright work in a material form – which is how copying is defined in the Copyright, Designs and Patents Act 1988.
In order for infringement to have taken place, we need two steps to be fulfilled: (1) there must be objective similarity between the output work and the whole or a substantial part of the original copyright work that is the author’s expression of their own intellectual creation; and (2) the output work must be derived from the original copyright work.
If the user explicitly instructs the AI to create a work and this new work closely resembles the copyright owner’s original work or a part of it, then establishing the first of the two criteria above, namely the requisite objective similarity between the two works, may be quite easy. What is likely to be much more difficult and in some cases, practically impossible, is for the copyright owner to demonstrate that the “output” work is derived from their original copyright work.
How might a copyright owner establish their claim? And how might a user of AI satisfy themselves an output work does not infringe?
Perhaps the most clear-cut kind of situation is one where the original copyright work is not within the dataset on which the AI was taught. In such a case the output work cannot be derived from the original and it cannot directly infringe it. So, if the dataset is known, there could be an effective defence to an infringement claim.
Even in more “opaque” situations, it will often be a major challenge for a copyright owner to sustain an infringement claim in respect of such a possibly infringing output. For one thing, at the outset, the copyright owner bringing the claim will generally not know what is in the dataset and, depending on the circumstances, finding out could be very challenging.
Another practical difficulty is that there could be a number of works in the dataset all of which are similar to the original copyright work, but not all of which have been created by the original copyright owner. This would make it harder for the copyright owner to demonstrate that it is their work that has been infringed.
By contrast, it will be easier for the copyright owner to establish their claim in the situation where they can demonstrate that their original copyright work was within the dataset that taught the AI and easier still, if there are no similar works to this within the dataset or if the output work is extremely similar to their original copyright work. One can imagine cases where arguments are put forward that an output resembles a copyright work so closely that the only reasonable explanation is that copying must have taken place.
Along the same lines, the claim will be much easier if the output work contains identical or very similar distinctive features of the original, like a watermark or signature (in the case of images), or perhaps a line of distinctive code in a software context. Such clear markers will, however, be the exception rather than the rule.
So, because often people will not know what data has been used to teach an AI tool, or because they are likely to find it difficult to effectively obtain and then search that data, both users of the tool and those who consider that their copyright may have been infringed will face challenges.
On one hand, it is very difficult for a user generating a new output work using an AI system to be entirely sure that the new work does not infringe copyright. Unless they are confident that an exception will apply covering their creation and/or use of the work or they know exactly what was within the teaching dataset, there is always a possible infringement risk.
On the other hand, it can also be very difficult for the copyright owner to establish their claim in ordinary circumstances and this may have the effect of reducing the risk to the user of facing an infringement action. These challenges, which would likely lead to increased costs for both parties in litigation, help to explain why it has been predominantly larger entities or groups bringing claims or class actions and why the claims are against well-resourced AI technology owners rather than users. The chances of a substantial recovery, and of the defendant being able to pay substantial amounts, are much higher.
A final note on infringement by the outputs of generative AI tools: use of the created work could, in some circumstances, result in the infringement of various other intellectual property rights, not only copyright. For instance, if the work created was similar to a registered trade mark or a design protected by registered or unregistered design right, then some uses of the work may infringe these other rights.
Approach of generative AI businesses to infringement
In spite of the evidential challenges faced by potential claimants, there are clear risks for users who are using AI solutions to create new outputs. Help is at hand, however, and some businesses offering such AI solutions are looking to reassure their users.
Microsoft, for example, are very clear on their position regarding their AI Copilot product, offering their Copilot Copyright Commitment:
"As customers ask whether they can use Microsoft's Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved."2
Similarly, Google offer indemnities to users of some of their AI-powered services in respect of losses they may incur for IP infringement. “To our knowledge,” they have said, “Google is the first in the industry to offer a comprehensive, two-pronged approach to indemnity”3 with cover that provides protection in respect of both infringement in terms of the outputs from their Google Cloud and Workplace platforms but also in terms of the inputs, i.e. copyright claims in relation to the training of their systems.
Needless to say, terms and conditions will apply to these respective indemnities and reassurances and it is likely, for example, that deliberate attempts to infringe or acts that are perhaps reckless as to whether third party rights are infringed would be excluded.
Input infringement update
On the topic of “inputs”, as we touched upon in our first article, in their January 2023 report the House of Lords’ Communications and Digital Committee criticised the UK Intellectual Property Office’s June 2022 suggestion that the text and data mining (TDM) copyright exception should be broadened to allow for the mining of all copyright works with no opt-out option for the rights holders. In current form, that exception only allows TDM for non-commercial research by those who already have lawful access to the copyright works.
Following the Government’s confirmation that it will not expand the exception, on 30 August 2023, the Culture, Media and Sport Committee published a report1 endorsing that decision. The Committee remarks in the report that it considers the current exception to be “an appropriate balance between innovation and creator rights”. Above, we looked at some of the inherent difficulties around bringing a claim particularly if you are copyright owner with limited resources and it is unsurprising that in its report the Committee has also called on the Government to “consider how creatives can ensure transparency and, if necessary, recourse and redress if they suspect AI developers are wrongfully using their works in AI development”.
Clearly, the removal of the TDM exception is positive for copyright owners. The UK’s Intellectual Property Office is at the time of writing (November 2023) working with users and rights holders to develop an AI and IP code of practice. One of its aims is to make licences for data mining more available and it will seek to help to surmount some of the barriers AI providers face while also protecting the interests of the rights holders. It will be a voluntary code of practice but the Government will legislate if it proves to be ultimately necessary. We are intrigued to see how the balance plays out between the concerns of the creatives on the one hand and the desire to encourage AI innovation and development on the other.