Copyright and AI: Part 1 – Teaching the machine
There has been lots written on the ChatGPT chatbot and the latest iteration of its underlying large language model, GPT-4. These relatively new technologies have caused quite a stir, and for good reason. For those who have been living in a cave, what are we talking about? ChatGPT is a chatbot created, designed and trained by OpenAI – the research company originally backed by Elon Musk and in which Microsoft now holds a substantial share. Its purpose is to interact in a conversational way with the user and it does produce remarkably realistic and reliable content. It is not infallible but, according to CNN, it took and successfully passed law exams in four courses at the University of Minnesota, after completing 95 multiple choice questions and 12 essay questions, as well as passing a business management course at Wharton.
As well as ChatGPT, OpenAI have produced DALL-E 2 and Stability AI have produced Stable Diffusion, both AI tools that produce visual “art” works. Stable Diffusion is described as a “latent text-to-image diffusion model capable of generating photo-realistic images given any text input”.
As IP lawyers we wanted to help explain some of the intellectual property issues raised by these tools, and in doing so, flag some of the differences in approach between the UK and the US. The creators and owners of copyright works have expressed disquiet and, in fact, a number lawsuits have been issued against AI companies, including one here in the UK against Stability AI by Getty Images in respect of their Stable Diffusion AI and others in the US.
The starting point is copyright. Copyright law exists to protect the rights of creators of original works, including literature, music, and software. However, with the rise of artificial intelligence (AI) and machine learning, there are important new questions about how copyright law applies to both the creation, and also the use, of AI-generated works.
There is a distinction between infringement by training of the AI and infringement by output of the AI and in this first article, we are looking at possible infringements taking place when the AI is trained and taught.
To train an AI, it must be provided with material it can learn from. In practice, this likely means providing it with a huge number of files that may have been collated from various sources, including potentially scraping from public websites. In principle, the more useful and relevant information an AI is provided with to learn from, the higher quality output it should be able to produce. For instance AI generated written works may have better syntax or artistic works may more successfully reflect their instructions.
A possible copyright infringement may occur where copyright protected material is used to train an AI and, crucially, use of that material is without permission of the copyright owner. If permission not been sought from the copyright owner in respect of a work and licences or terms and conditions do not provide permission, it is quite possible to see, absent a defence or exception, how copyright could be infringed by an exercise where large datasets are compiled by scraping websites or otherwise, analysed or “mined” and then processed by an AI.
Therefore, governments are faced with the difficult task of protecting the creator and IP rights holder while at the same time providing a legal framework that accommodates and encourages the development of AI. This accommodation may take the form of exceptions or defences to infringement for those who use material to train an AI. Additionally, governments could issue legislation to facilitate further the affording of copyright protection to the work that an AI has produced. Ultimately, governments may use legislation in order to try and obtain a competitive advantage over governments in other countries by positioning themselves as being particularly AI-friendly or forward-thinking.
Among the potentially interesting exceptions in the UK are, first, text and data mining for non-commercial purposes and secondly, the making of temporary copies which are transient or incidental. In the UK, the mining exception has been recently considered by the House of Lords’ Communications and Digital Committee. At present, for the mining exception to apply, the person making the copies must have lawful access to works and must be analysing the works solely to conduct non-commercial research. A rights holder, who is aware of their rights and makes their works unobtainable without a licence, for instance, appears to be in a strong position in the UK. This would likely include those that put their works behind a paywall.
Separately, if, in teaching the AI, copies of works are stored permanently then any exception for temporary copies should not apply either.
In the US, the position is somewhat different. Under US federal law, the concept of transformation is crucial. Has the original copyright work been sufficiently changed, transcended or transformed to such an extent that the new derivative work does not infringe the copyright in it? This is relevant at both the training and teaching stage but also, more obviously, at output stage when a new work has been created.
Returning to the UK, the UK IPO indicated in June 2022 that it would favour a new approach expanding the mining exception to those analysing the works for any purpose with no opt-out option available for the rights holders. While there would still be a requirement that the works are accessed lawfully and the rights holders are not entirely without protection, their position and ability to license works would be severely hampered by this stance and the UK IPO’s proposal was, as expected, met with considerable criticism by a number of rights holder representative bodies.
The House of Lords’ Communications and Digital Committee responded to the rights holders’ concerns disagreeing with the UK IPO’s expansion proposal and ultimately, the stance that the Government will take is far from settled at the moment. It is an exciting area with a lot at stake and while the technology continues to develop at pace the government will be under pressure to adopt a supportive yet competitive position.