A Modern Marriage: How AI Powered By Blockchain Could Protect IP Rights

12 March 2024

“The creative industries have long been raising concerns that their IP is being unfairly used to train AI systems without consent and without compensation. The lack of even a voluntary code will not allay these concerns.”

“The industry is asking for transparency on what models have and haven’t been trained on, and what works are being used. The IPO hasn’t found answers to those questions.”

The quotes, from the Culture, Media and Sport Committee and the Design and Artists Copyright Society respectively, have surfaced from the rubble after recent talks between rights holders and the Intellectual Property Office to establish a UK voluntary copyright and AI code of practice have collapsed.¹

The Culture, Media and Sport Committee, the Design and Artists Copyright Society and senior bosses from UK’s creative sectors were among the participants who were not able to find a solution to this topical issue.

While the proliferation of generative AI into mainstream society via AI models like ChatGPT and Midjourney over the last year has been breathtaking for many, it has more than likely left regulators and legislators breathless as they try to catch up to the current technological age.

Generative AI (GenAI) tools rely heavily on models which are trained on massive data sets, sometimes scraped from the World Wide Web, to generate user prompted outputs. However, they are often extremely large and it is usually hard for third parties to gain access to them or to truly know what is in them. This in turn makes it difficult for (1) copyright owners to establish that their work has been used as input or training data, (2) copyright owners to seek compensation for the unauthorised use of their copyrighted works and (3) users of AI tools to minimise the risks of inadvertent copyright infringement by using the tool to generate AI outputs that are similar to copyrighted works. These problems have been further elaborated on in my colleagues’ recent insight pieces on “Copyright and AI: Part 1 - Teaching the machine” here and “Copyright and AI: Part 2- Infringement by machine?” here.

The Current Landscape

Currently, large amounts of data gathering takes place by private entities behind closed doors for their own benefit. For example, Facebook constantly gathers information from their users, which is then traded and used for advertising. Some gathering can take place with sources available under restrictions such as paywalls. However, most data scraping is done via online sources accessible to the public on the World Wide Web. For example, the WebVid dataset contains 10m video preview clips solely from Shutterstock². This dataset was used, with Shutterstock’s permission, to train Meta AI’s Make-A-Video AI system. A problem arises when copyright works are used without the right owner’s permission. OpenAI’s Generative Pre-trained Transformer 3 (GPT-3) large language model (LLM) has been trained with 499 tokens³ of data from online sources. It is speculated that OpenAI, the founders of GPT, used a mixture of publicly available data and allegedly illegally acquired materials, such as the Books2 dataset containing about 294,000 titles from various authors, a few of whom have since decided to commence proceedings⁴.

There are questions around whether profits generated from the use of this body of data are distributed fairly. Currently, while there are numerous intellectual property court cases ongoing here in the UK and in the US and elsewhere, the profits earned by Generative AI companies are rarely shared with the owners of rights in the materials that comprise the training dataset. Let us imagine AI companies suddenly have a magnanimous change of heart and want to start compensating right owners for the content. They create a database weighing up the amount of each individual contributor’s content that is fed as training input and decide to maintain this database and pay out right owners accordingly. The issue here is that not only would right owners need to trust that companies like Meta or OpenAI would be compensating them fairly for their data being used, but they would also need to trust auditors to confirm that these companies are rightfully paying out data owners for their content/copyright use. One must remember that these firms are non-deterministic hierarchical structures, which ultimately concentrate decision-making power to a board of people, albeit answerable to owners or stakeholders, so there is an inherent possibility of bias or greed.

The Big Picture

Before breaking down the issues above, let us look at the big picture. In the long run, economic growth is and has been fundamentally powered by technological trajectories of innovation. For instance, the railways (industrial technology) required standardised clocks and timetables (institutional technology) to make them useful at scale.⁵ The internet (industrial technology) required institutional standards like TCP/IP (Transmission Control Protocol/ Internet Protocol) to facilitate interoperability and global use at scale. The principle here is that industrial technologies create economic value at scale only when coupled with institutional technologies. GenAI is a digital technology borne of mathematics and computers but is an ‘industrial’ technology that in this author’s view needs to be institutionally contained with rules to make it economically valuable.

This is where the union of the “creative”, “non-deterministic” generative AI and the “deterministic”, “reliable”, “transparent” blockchain technology and its smart contracts comes in. The blockchain, as an ‘institutional technology’, can potentially help solve more than just copyright issues for copyright owners and AI tool users but for the scope of this article, this is what we will consider. A blockchain-backed governance solution fits in going forward, where we want to set up new databases of safe or copyright-friendly content. At this point, you might be perplexed. Surely, blockchain technology, widely regarded simply as a hotbed for cryptocurrency scams by the mainstream media, cannot offer a legitimate solution to tackling copyright infringement and the fair compensation for copyrighted works?

The blockchain is a type of distributed ledger technology that uses cryptography to provide an immutable record of transactions on a decentralised network without any centralised authority. This framework is ideal for hosting data that cannot be compromised. Additionally, every transaction on the blockchain is transparent and available for public viewing. While the opacity of data scraping processes can often be a property of generative AI architecture, the inherent transparency and security of blockchain networks can provide on-chain guardrails for the GenAI models to work their magic to the benefit of society.

Companies training AI models may challenge this notion and argue against using a blockchain-based system that would remove human intervention and shine a glaring light at their opaque AI-training processes. However, looking at the amount of copyright infringement lawsuits already in the public eye like the ones against OpenAI and Meta, the blockchain could provide the basis of a negotiated solution that enables the transparent remuneration of copyright owners.

Why use the blockchain?

In the digital era, one’s digital identity and ownership of digital assets will become increasingly important as technology develops. As more people transition their businesses and lifestyles to the online world, owners of copyrighted works will realise they are able to verify ownership of their own data via digital assets on the blockchain. One way to tokenise data would be via non-fungible tokens (NFTs), which can be used to verify origins of various forms of media including images, texts, videos and music.⁶ The question right owners may ask is, why would they go through this process and tokenise their data on the blockchain to voluntarily give GenAI models training material?

Firstly, copyright infringement occurs where someone uses the whole or a substantial part of the right owner’s work without their permission or an applicable defence. If right owners voluntarily provide their copyrighted material for use, this of course eradicates the issue of copyright infringement entirely, be it from the input of training material into AI models, or the AI generated output of these models. This potentially provides an avenue for solving problems (1) and (3) set out above – although of course the right owners must be rewarded for doing so.

Secondly, by tokenising their data and giving permission for GenAI models to use it, right owners benefit by being able to track the use of their data. If the infrastructure around the data troves that GenAI models scrape is put on-chain as a base for AI machine learning, a synergistic relationship between blockchain and the GenAI model could emerge. The blockchain facilitates the transparent record of data, providing AI models with a clear framework for their operations. The immutability of the blockchain can reveal to copyright owners if copyrighted data, and whose, is being relied upon, as input for training material to generate AI outputs. Such data would also be free from leakage or tampering due to the block encryption of the learning data⁷ and the peer-to-peer nature of the technology⁸ respectively. Being able to track data input for AI outputs provides rights owners and AI tool users a level of trust and security on something which would otherwise be a grey area.

If those are not good enough reasons for using a blockchain solution to help solve this problem, maybe the third will help.

If copyright owners are incentivised, this can encourage them to provide private data for generative AI companies to learn and train their AI models. GenAI models are tools that are being used today to increase productivity and efficiency and by extension financial gain. If the wave of legal actions is to stop, these gains need to be redistributed between the model owners and the right owners.

The immutability of blockchain-enabled smart contracts is a feature that eliminates the need for human trust. There is assurance that no more or less authority over a user’s rights or assets is available other than what has been explicitly agreed to in advance. It provides certainty and predictability. When LLMs obtain revenue from the use for training of material linked to these NFTs, upon analysis of audit trails on the decision-making patterns of algorithms⁹, these programmable smart contracts interacting with the blockchain can allocate part of the revenue to the right owner as a royalty payment according to the weight or proportion of the owner’s data that was used in training. This conveniently solves Problem (2).

As an example of arguably similar solutions already in existence, datalatte¹⁰ is a blockchain/AI solution that allows you to engage in conversations with its LLM chatbot to generate and share insightful data. This data is then tokenised into NFTs and the owner has full authority over who assesses the data and how it is used. If the tokenised data is used in a query, the owner receives a payment from it.

The Open Music Initiative is another example of a collective effort to build an open-source protocol for managing music and copyright data. Their solution provides copyright owners with control, recognition, and compensation for their works.¹¹

Conclusion

With the development of emerging technologies moving at breakneck pace, I believe the symbiotic relationship between blockchain technology and generative AI will be the most efficient and effective means to solve these copyright infringement issues that have emerged from both the training input and product output segments of the generative AI model.

The copyright owners in this new technological age may have the opportunity to effectively monetise their content while retaining control over it. At the same time, the generative AI owners can have access to legitimate content to train their models. As the saying goes, change is the only constant. It is time the industry shifts to embrace this modern marriage.

¹ S Speight, (2024), UK fails in bid to create AI voluntary code as talks collapse
² S Willison, (2022), Exploring 10m scraped Shutterstock videos used the train Meta’s Make-A-Video text-to-video model
³ A Guadamuz,(2024), A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs,
⁴ As an update on 12 February 2024: The US District Judge has dismissed 5 of the 6 claims, leaving the claim for direct copyright infringement open, and has ordered the plaintiffs to file an amended complaint.
⁵ C Berg, S Davidson, J Potts, (2023), Institutions to constrain chaotic robots: why generative AI needs blockchain
⁶ Chainlink, (2023), Use Cases of AI in Blockchain
⁷ J Kim, N Park, (2020), Blockchain-Based Data-Preserving AI Learning Environment Model for AI Cybersecurity Systems in IoT Service Environments
⁸ H Luo, J Luo, A V. Vasilakos, (2023), BC4LLM, Trusted Artificial Intelligence When Blockchain Meets Large Language Models
⁹ Chainlink, (2023), Use Cases of AI in Blockchain
¹⁰ www.datalatte.com
¹¹ Our Open Protocols Approach — Open Music Initiative (open-music.org)

Our thinking

Business over Breakfast: Arbitration is cheaper – Myth or Reality?

Thomas R. Snider

Events
09 May 2024
Find out more
Fiona Edmond writes for The Law Society Gazette on taking maternity leave as a Deputy Senior Partner

Fiona Edmond

In the Press
26 April 2024
Find out more
The UK’s March 2024 Budget: how the proposed new tax rules will work for US-connected clients

Sangna Chauhan

Insights
26 April 2024
Find out more
Takeover Panel consults on narrowing the scope of the Takeover Code

Jodie Dennis

Insights
26 April 2024
Find out more
Nick Hurley and Annie Green write for Employee Benefits on the impact of dropping the real living wage pledge

Nick Hurley

In the Press
25 April 2024
Find out more
The UK’s March 2024 budget: Offshore trusts - have reports of their demise been greatly exaggerated?

Sophie Dworetzsky

Insights
25 April 2024
Find out more
Playing with FYR: planning opportunities offered by the UK’s proposed four-year regime for newcomers to the UK

Catrin Harrison

Insights
25 April 2024
Find out more
James Broadhurst writes for the Financial Times’ Your Questions column on inheriting company shares

James Broadhurst

In the Press
24 April 2024
Find out more
Cara Imbrailo and Ilona Bateson write for Fashion Capital on pop-up shops

Cara Imbrailo

In the Press
22 April 2024
Find out more
City AM quotes Charlotte Duly on the importance of business branding

Charlotte Duly

In the Press
22 April 2024
Find out more
Planning and Life Sciences: the challenges and opportunities in the Golden Triangle

Sophie Willis

Quick Reads
19 April 2024
Find out more
Personnel Today quotes Rose Carey on Italy’s new digital nomad visa

Rose Carey

In the Press
19 April 2024
Find out more
Regime change: The beginning of the end of the remittance basis

Dominic Lawrance

Insights
19 April 2024
Find out more
Essential Intelligence – UAE Fraud, Asset Tracing & Recovery

Sara Sheffield

Insights
18 April 2024
Find out more
IFA Magazine quotes Julia Cox on the possibility of more tax cuts before the general election

Julia Cox

In the Press
18 April 2024
Find out more
‘One plus one makes two': Court of Protection finds conflict of interest within law firm structure

Katie Foulds

Insights
18 April 2024
Find out more
City AM quotes Charlotte Duly on Tesco’s Clubcard rebrand after losing battle with Lidl

Charlotte Duly

In the Press
17 April 2024
Find out more
Michael Powner writes for Raconteur on AI and automating back-office roles

Michael Powner

In the Press
17 April 2024
Find out more
Arbitration: Getting value for your money

Daniel McDonagh

Insights
16 April 2024
Find out more
Portfolio Adviser quotes Richard Ellis on the FCA's first public findings against former fund manager Neil Woodford

Richard Ellis

In the Press
16 April 2024
Find out more
eprivateclient quotes Sally Ashford on considerations around power of attorney

Sally Ashford

In the Press
15 April 2024
Find out more
Computer says No - my prediction of UK border chaos on Wednesday 1 January 2025

Paul McCarthy

Quick Reads
12 April 2024
Find out more
London’s Knowledge Clusters: From Emerging to Maturing – Start Ups on the Global Stage?

Lynsey Inglis

Quick Reads
03 April 2024
Find out more
Fashion and the Green Claims Code brought into focus by open letter from the CMA.

Ilona Bateson

Quick Reads
28 March 2024
Find out more
Will new powers at Companies House stop or slow down fraudsters?

Peter Carlyon

Quick Reads
07 March 2024
Find out more
Charles Russell Speechlys hosts international arbitration event in Dubai

Peter Smith

Quick Reads
05 March 2024
Find out more
It’s not just a High Court decision, it’s a successful M&S High Court Decision

Sophie Willis

Quick Reads
01 March 2024
Find out more
Dawn raids... a new dawn?

Rhys Novak

Quick Reads
26 February 2024
Find out more
The ongoing fight against fakes

Charlotte Duly

Quick Reads
14 February 2024
Find out more
Abu Dhabi’s New Arbitral Centre Unveils its Rules

Dalal Alhouti

Quick Reads
08 February 2024
Find out more
Planning essentials case update: when can an enforcement notice against an unlawful use also require the removal of related structures?

Sadie Pitman

Quick Reads
10 January 2024
Find out more
Dubai Court of Cassation Extends Arbitration Agreement Across Subsequent Contracts

Peter Smith

Quick Reads
03 November 2023
Find out more
Good news for users of the Madrid System

Charlotte Duly

Quick Reads
01 November 2023
Find out more
Michael Gove's announcement on transitional period for two staircase requirement for new residential buildings

Melanie Hardingham

Quick Reads
01 November 2023
Find out more
Nigeria's challenge to US$11 billion award succeeds in the High Court of Justice of England and Wales

John Olatunji

Quick Reads
26 October 2023
Find out more
Navratri at Charles Russell Speechlys

Arjun Thakrar

Quick Reads
19 October 2023
Find out more
An important reminder for employers on World Menopause Day

Isobel Goodman

Quick Reads
18 October 2023
Find out more
UAE Polishes Federal Arbitration Law

Peter Smith

Quick Reads
04 October 2023
Find out more
A Labour government: what might be in store for personal taxation?

Sarah Wray

Quick Reads
28 September 2023
Find out more
What next for HS2?

Richard Flenley

Quick Reads
26 September 2023
Find out more