• news-banner

    Expert Insights

A Modern Marriage: How AI Powered By Blockchain Could Protect IP Rights

“The creative industries have long been raising concerns that their IP is being unfairly used to train AI systems without consent and without compensation. The lack of even a voluntary code will not allay these concerns.”

“The industry is asking for transparency on what models have and haven’t been trained on, and what works are being used. The IPO hasn’t found answers to those questions.”

The quotes, from the Culture, Media and Sport Committee and the Design and Artists Copyright Society respectively, have surfaced from the rubble after recent talks between rights holders and the Intellectual Property Office to establish a UK voluntary copyright and AI code of practice have collapsed.1

The Culture, Media and Sport Committee, the Design and Artists Copyright Society and senior bosses from UK’s creative sectors were among the participants who were not able to find a solution to this topical issue.

While the proliferation of generative AI into mainstream society via AI models like ChatGPT and Midjourney over the last year has been breathtaking for many, it has more than likely left regulators and legislators breathless as they try to catch up to the current technological age.

Generative AI (GenAI) tools rely heavily on models which are trained on massive data sets, sometimes scraped from the World Wide Web, to generate user prompted outputs. However, they are often extremely large and it is usually hard for third parties to gain access to them or to truly know what is in them. This in turn makes it difficult for (1) copyright owners to establish that their work has been used as input or training data, (2) copyright owners to seek compensation for the unauthorised use of their copyrighted works and (3) users of AI tools to minimise the risks of inadvertent copyright infringement by using the tool to generate AI outputs that are similar to copyrighted works. These problems have been further elaborated on in my colleagues’ recent insight pieces on “Copyright and AI: Part 1 - Teaching the machine” here and “Copyright and AI: Part 2- Infringement by machine?” here.

The Current Landscape

Currently, large amounts of data gathering takes place by private entities behind closed doors for their own benefit. For example, Facebook constantly gathers information from their users, which is then traded and used for advertising. Some gathering can take place with sources available under restrictions such as paywalls. However, most data scraping is done via online sources accessible to the public on the World Wide Web. For example, the WebVid dataset contains 10m video preview clips solely from Shutterstock2. This dataset was used, with Shutterstock’s permission, to train Meta AI’s Make-A-Video AI system. A problem arises when copyright works are used without the right owner’s permission. OpenAI’s Generative Pre-trained Transformer 3 (GPT-3) large language model (LLM) has been trained with 499 tokens3 of data from online sources. It is speculated that OpenAI, the founders of GPT, used a mixture of publicly available data and allegedly illegally acquired materials, such as the Books2 dataset containing about 294,000 titles from various authors, a few of whom have since decided to commence proceedings4

There are questions around whether profits generated from the use of this body of data are distributed fairly. Currently, while there are numerous intellectual property court cases ongoing here in the UK and in the US and elsewhere, the profits earned by Generative AI companies are rarely shared with the owners of rights in the materials that comprise the training dataset. Let us imagine AI companies suddenly have a magnanimous change of heart and want to start compensating right owners for the content. They create a database weighing up the amount of each individual contributor’s content that is fed as training input and decide to maintain this database and pay out right owners accordingly. The issue here is that not only would right owners need to trust that companies like Meta or OpenAI would be compensating them fairly for their data being used, but they would also need to trust auditors to confirm that these companies are rightfully paying out data owners for their content/copyright use. One must remember that these firms are non-deterministic hierarchical structures, which ultimately concentrate decision-making power to a board of people, albeit answerable to owners or stakeholders, so there is an inherent possibility of bias or greed.

The Big Picture

Before breaking down the issues above, let us look at the big picture. In the long run, economic growth is and has been fundamentally powered by technological trajectories of innovation. For instance, the railways (industrial technology) required standardised clocks and timetables (institutional technology) to make them useful at scale.5 The internet (industrial technology) required institutional standards like TCP/IP (Transmission Control Protocol/ Internet Protocol) to facilitate interoperability and global use at scale. The principle here is that industrial technologies create economic value at scale only when coupled with institutional technologies. GenAI is a digital technology borne of mathematics and computers but is an ‘industrial’ technology that in this author’s view needs to be institutionally contained with rules to make it economically valuable.

This is where the union of the “creative”, “non-deterministic” generative AI and the “deterministic”, “reliable”, “transparent” blockchain technology and its smart contracts comes in. The blockchain, as an ‘institutional technology’, can potentially help solve more than just copyright issues for copyright owners and AI tool users but for the scope of this article, this is what we will consider. A blockchain-backed governance solution fits in going forward, where we want to set up new databases of safe or copyright-friendly content. At this point, you might be perplexed. Surely, blockchain technology, widely regarded simply as a hotbed for cryptocurrency scams by the mainstream media, cannot offer a legitimate solution to tackling copyright infringement and the fair compensation for copyrighted works?

The blockchain is a type of distributed ledger technology that uses cryptography to provide an immutable record of transactions on a decentralised network without any centralised authority. This framework is ideal for hosting data that cannot be compromised. Additionally, every transaction on the blockchain is transparent and available for public viewing. While the opacity of data scraping processes can often be a property of generative AI architecture, the inherent transparency and security of blockchain networks can provide on-chain guardrails for the GenAI models to work their magic to the benefit of society.

Companies training AI models may challenge this notion and argue against using a blockchain-based system that would remove human intervention and shine a glaring light at their opaque AI-training processes. However, looking at the amount of copyright infringement lawsuits already in the public eye like the ones against OpenAI and Meta, the blockchain could provide the basis of a negotiated solution that enables the transparent remuneration of copyright owners.

Why use the blockchain?

In the digital era, one’s digital identity and ownership of digital assets will become increasingly important as technology develops. As more people transition their businesses and lifestyles to the online world, owners of copyrighted works will realise they are able to verify ownership of their own data via digital assets on the blockchain. One way to tokenise data would be via non-fungible tokens (NFTs), which can be used to verify origins of various forms of media including images, texts, videos and music.6 The question right owners may ask is, why would they go through this process and tokenise their data on the blockchain to voluntarily give GenAI models training material?

Firstly, copyright infringement occurs where someone uses the whole or a substantial part of the right owner’s work without their permission or an applicable defence. If right owners voluntarily provide their copyrighted material for use, this of course eradicates the issue of copyright infringement entirely, be it from the input of training material into AI models, or the AI generated output of these models. This potentially provides an avenue for solving problems (1) and (3) set out above – although of course the right owners must be rewarded for doing so.

Secondly, by tokenising their data and giving permission for GenAI models to use it, right owners benefit by being able to track the use of their data. If the infrastructure around the data troves that GenAI models scrape is put on-chain as a base for AI machine learning, a synergistic relationship between blockchain and the GenAI model could emerge. The blockchain facilitates the transparent record of data, providing AI models with a clear framework for their operations. The immutability of the blockchain can reveal to copyright owners if copyrighted data, and whose, is being relied upon, as input for training material to generate AI outputs. Such data would also be free from leakage or tampering due to the block encryption of the learning data7 and the peer-to-peer nature of the technology8 respectively. Being able to track data input for AI outputs provides rights owners and AI tool users a level of trust and security on something which would otherwise be a grey area.

If those are not good enough reasons for using a blockchain solution to help solve this problem, maybe the third will help.

If copyright owners are incentivised, this can encourage them to provide private data for generative AI companies to learn and train their AI models. GenAI models are tools that are being used today to increase productivity and efficiency and by extension financial gain. If the wave of legal actions is to stop, these gains need to be redistributed between the model owners and the right owners.

The immutability of blockchain-enabled smart contracts is a feature that eliminates the need for human trust. There is assurance that no more or less authority over a user’s rights or assets is available other than what has been explicitly agreed to in advance. It provides certainty and predictability. When LLMs obtain revenue from the use for training of material linked to these NFTs, upon analysis of audit trails on the decision-making patterns of algorithms9, these programmable smart contracts interacting with the blockchain can allocate part of the revenue to the right owner as a royalty payment according to the weight or proportion of the owner’s data that was used in training. This conveniently solves Problem (2).

As an example of arguably similar solutions already in existence, datalatte10 is a blockchain/AI solution that allows you to engage in conversations with its LLM chatbot to generate and share insightful data. This data is then tokenised into NFTs and the owner has full authority over who assesses the data and how it is used. If the tokenised data is used in a query, the owner receives a payment from it.

The Open Music Initiative is another example of a collective effort to build an open-source protocol for managing music and copyright data. Their solution provides copyright owners with control, recognition, and compensation for their works.11 

Conclusion

With the development of emerging technologies moving at breakneck pace, I believe the symbiotic relationship between blockchain technology and generative AI will be the most efficient and effective means to solve these copyright infringement issues that have emerged from both the training input and product output segments of the generative AI model.

The copyright owners in this new technological age may have the opportunity to effectively monetise their content while retaining control over it. At the same time, the generative AI owners can have access to legitimate content to train their models. As the saying goes, change is the only constant. It is time the industry shifts to embrace this modern marriage.

 

 

1 S Speight, (2024), UK fails in bid to create AI voluntary code as talks collapse
2 S Willison, (2022), Exploring 10m scraped Shutterstock videos used the train Meta’s Make-A-Video text-to-video model
3 A Guadamuz,(2024), A Scanner Darkly: Copyright Liability and Exceptions in Artificial Intelligence Inputs and Outputs,  
4  As an update on 12 February 2024: The US District Judge has dismissed 5 of the 6 claims, leaving the claim for direct copyright infringement open, and has ordered the plaintiffs to file an amended complaint.
5  C Berg, S Davidson, J Potts, (2023), Institutions to constrain chaotic robots: why generative AI needs blockchain
6 Chainlink, (2023), Use Cases of AI in Blockchain
7 J Kim, N Park, (2020), Blockchain-Based Data-Preserving AI Learning Environment Model for AI Cybersecurity Systems in IoT Service Environments
8 H Luo, J Luo, A V. Vasilakos, (2023), BC4LLM, Trusted Artificial Intelligence When Blockchain Meets Large Language Models
9 Chainlink, (2023), Use Cases of AI in Blockchain
10 www.datalatte.com 
11 Our Open Protocols Approach — Open Music Initiative (open-music.org)

Our thinking

  • Business over Breakfast: Arbitration is cheaper – Myth or Reality?

    Thomas R. Snider

    Events

  • Fiona Edmond writes for The Law Society Gazette on taking maternity leave as a Deputy Senior Partner

    Fiona Edmond

    In the Press

  • The UK’s March 2024 Budget: how the proposed new tax rules will work for US-connected clients

    Sangna Chauhan

    Insights

  • Takeover Panel consults on narrowing the scope of the Takeover Code

    Jodie Dennis

    Insights

  • Nick Hurley and Annie Green write for Employee Benefits on the impact of dropping the real living wage pledge

    Nick Hurley

    In the Press

  • The UK’s March 2024 budget: Offshore trusts - have reports of their demise been greatly exaggerated?

    Sophie Dworetzsky

    Insights

  • Playing with FYR: planning opportunities offered by the UK’s proposed four-year regime for newcomers to the UK

    Catrin Harrison

    Insights

  • James Broadhurst writes for the Financial Times’ Your Questions column on inheriting company shares

    James Broadhurst

    In the Press

  • Cara Imbrailo and Ilona Bateson write for Fashion Capital on pop-up shops

    Cara Imbrailo

    In the Press

  • City AM quotes Charlotte Duly on the importance of business branding

    Charlotte Duly

    In the Press

  • Planning and Life Sciences: the challenges and opportunities in the Golden Triangle

    Sophie Willis

    Quick Reads

  • Personnel Today quotes Rose Carey on Italy’s new digital nomad visa

    Rose Carey

    In the Press

  • Regime change: The beginning of the end of the remittance basis

    Dominic Lawrance

    Insights

  • Essential Intelligence – UAE Fraud, Asset Tracing & Recovery

    Sara Sheffield

    Insights

  • IFA Magazine quotes Julia Cox on the possibility of more tax cuts before the general election

    Julia Cox

    In the Press

  • ‘One plus one makes two': Court of Protection finds conflict of interest within law firm structure

    Katie Foulds

    Insights

  • City AM quotes Charlotte Duly on Tesco’s Clubcard rebrand after losing battle with Lidl

    Charlotte Duly

    In the Press

  • Michael Powner writes for Raconteur on AI and automating back-office roles

    Michael Powner

    In the Press

  • Arbitration: Getting value for your money

    Daniel McDonagh

    Insights

  • Portfolio Adviser quotes Richard Ellis on the FCA's first public findings against former fund manager Neil Woodford

    Richard Ellis

    In the Press

  • eprivateclient quotes Sally Ashford on considerations around power of attorney

    Sally Ashford

    In the Press

  • Computer says No - my prediction of UK border chaos on Wednesday 1 January 2025

    Paul McCarthy

    Quick Reads

  • London’s Knowledge Clusters: From Emerging to Maturing – Start Ups on the Global Stage?

    Lynsey Inglis

    Quick Reads

  • Fashion and the Green Claims Code brought into focus by open letter from the CMA.

    Ilona Bateson

    Quick Reads

  • Will new powers at Companies House stop or slow down fraudsters?

    Peter Carlyon

    Quick Reads

  • Charles Russell Speechlys hosts international arbitration event in Dubai

    Peter Smith

    Quick Reads

  • It’s not just a High Court decision, it’s a successful M&S High Court Decision

    Sophie Willis

    Quick Reads

  • Dawn raids... a new dawn?

    Rhys Novak

    Quick Reads

  • The ongoing fight against fakes

    Charlotte Duly

    Quick Reads

  • Abu Dhabi’s New Arbitral Centre Unveils its Rules

    Dalal Alhouti

    Quick Reads

  • Planning essentials case update: when can an enforcement notice against an unlawful use also require the removal of related structures?

    Sadie Pitman

    Quick Reads

  • Dubai Court of Cassation Extends Arbitration Agreement Across Subsequent Contracts

    Peter Smith

    Quick Reads

  • Good news for users of the Madrid System

    Charlotte Duly

    Quick Reads

  • Michael Gove's announcement on transitional period for two staircase requirement for new residential buildings

    Melanie Hardingham

    Quick Reads

  • Nigeria's challenge to US$11 billion award succeeds in the High Court of Justice of England and Wales

    John Olatunji

    Quick Reads

  • Navratri at Charles Russell Speechlys

    Arjun Thakrar

    Quick Reads

  • An important reminder for employers on World Menopause Day

    Isobel Goodman

    Quick Reads

  • UAE Polishes Federal Arbitration Law

    Peter Smith

    Quick Reads

  • A Labour government: what might be in store for personal taxation?

    Sarah Wray

    Quick Reads

  • What next for HS2?

    Richard Flenley

    Quick Reads

Back to top