"AI Battlefields" Conference - Some Highlights
I was lucky enough to go to Milan last week for the IBA’s “AI Battlefields” conference. Huge thanks to the organisers and speakers for a very slick event. It was a pleasure to meet with industry friends old and new. I wanted to share some of the points I found most interesting.
How much human intellectual labour is required for an AI generated work to be copyright protected? In one Chinese case (Li v Liu) , copyright was recognised in an “AI-generated work” which had been created with the use of around 150 prompts as well as various filters and processing techniques. There was found to be intellectual input from a human. By contrast, in a Czech case in which a single short prompt was used to create an image, the court ruled that there was no copyright in the image. The Prague court noted that no evidence was submitted to show the image resulted from the prompt. One also wonders whether, even if such evidence had been submitted, there would have been enough human intellectual input/creativity to give birth to a copyright work anyway; likely not.
EU-Centricity in Copyright Compliance? Article 53 of the AI Act mandates that providers of Generative AI (GPAI) models establish a policy ensuring compliance with EU copyright law. This implies strongly that if a model is to be marketed in the EU, it must be trained in compliance with EU copyright standards. That could be a problem for AI systems based on models that have been trained in compliance with more “permissive” copyright laws in say Japan or Singapore. They might be lawful at home but beware Article 53 if you want to place them them on the market here in the EU.
How do GenAI text generators reduce the “plagiarism” risk? Text generation in GenAI models relies on predicting the next most likely word based on the context provided. The model usually chooses the highest-probability word, but occasionally selects the second or third most likely word to add diversity. To avoid generating text that is too similar to existing content, models often include anti-plagiarism mechanisms: if the generated output becomes too close to known text, it may prioritise less common predictions (such as the fourth or fifth most probable words) to increase originality.
Software code and copyright. Nowadays, most software code is actually created by AI tools. The overall compilation may be protected by copyright – thanks to the degree of human/creative input – but individual lines would likely not be. This is potentially problematic for developers. Could there be an argument for some kind of sui generis right for AI generated works? The panelists were split on this.
Computing power and model training. During the ICT infrastructure session, we heard about how the pre-training of AI models demands the most significant computing power. This form of generally unsupervised training can get the model up to high school diploma level. By contrast, post-training, involving reinforcement learning or supervised learning, is less compute intensive but is crucial in pushing the model's capability up to PhD level. Post-training is where a lot of developers are focusing at the moment. There is also a lot of attention currently on smaller and more use case-specific models, rather than on the broader general purpose AI tools.
AI Act Risk Categories. It is well known to those with any familiarity with the Act that it places risks into four buckets: unacceptable risk, high risk, limited risk and minimal risk. While perhaps obvious on reflection, it was nonetheless interesting to hear a little bit about the process of ongoing dialogue between industry and regulators as to where each case sits in this framework.
Data localisation trend. Many countries want more data sovereignty. There was an example given of AWS collaborating on AI projects with various Indian public bodies and building servers in India.