AI and Data Protection
min readKey Takeaways
- AI should be trained and deployed in ways that uphold core data protection principles.
- Web scraping for AI training presents challenges when identifying an appropriate lawful basis and fulfilling transparency obligations.
- Agentic AI introduces new complexities including unclear controllership roles and elevated cyber risks due to greater autonomy.
Understanding the ICO's Approach to Responsible AI
The UK Government and UK Information Commissioner’s Office (ICO) recognise that AI has enormous potential to transform how businesses operate. Rather than rushing to introduce strict AI-specific laws, the UK Government has chosen a more pragmatic path. Its focus is on giving businesses practical guidance to manage data protection risks throughout the entire AI lifecycle, from development to deployment. This approach differs from the European Union, which has introduced dedicated AI legislation.
The ICO is also keeping a close eye on new AI technologies. In its January Tech Futures report, the regulator examined the privacy challenges posed by agentic AI, which refers to AI systems capable of acting more autonomously.
Data Protection Considerations in AI Training
Data protection law applies not just to what an AI tool produces, but also to the personal data used to train it. This creates several practical challenges for organisations. Before using personal data to train AI, businesses must consider whether they have a lawful basis to do so under the UK GDPR, such as relying on legitimate interests.
Informing Individuals and Providing Transparency
Organisations face the difficult task of informing individuals when their personal data is being used to train AI. This can be particularly challenging when there is no direct relationship with those individuals. Businesses must also think carefully about how to communicate their use of AI in a way that builds trust rather than creating suspicion or scepticism among the public.
Lawful Basis for Using Personal Data in AI Training
Under data protection law, organisations need a valid lawful basis to process personal data for AI training. The ICO has confirmed that legitimate interests is the only realistic lawful basis when using personal data scraped from the internet to train generative AI models. However, the ICO has warned that meeting the necessary balancing test is difficult in these circumstances, given the high-risk nature of such processing and the fact that it often happens without individuals' knowledge. The ICO's findings on web scraping align broadly with the European Data Protection Board's view that such techniques can breach transparency, data minimisation, and accuracy obligations.
Key Findings from the ICO's GenAI Outcomes Report
Following a consultation series on generative AI that ran from January to September 2024, the ICO published an outcomes report covering five important areas that will drive regulatory focus:
- Purpose limitation.
- Accuracy of training data and AI outputs.
- Allocation of controllership responsibilities across the GenAI supply chain.
- Embedding individual rights into AI model design.
- Assessing the lawful basis for web scraping.
Data Protection Risks Relating to AI Outputs
Core data protection principles such as lawfulness, fairness, accuracy, and accountability can be difficult to apply to AI systems. This is because there is often little visibility into how these systems reach their conclusions, a problem commonly referred to as the "black box" issue. Organisations that use AI without careful thought risk unknowingly breaching fundamental data protection rules.
Risks Linked to Data Quality and Provenance
AI systems, particularly large language models, are often trained on enormous datasets whose origins may be unclear. This means that important decisions about people could be based on data that is low quality, outdated, inaccurate, or biased. The consequences can be serious, including not only poor and unreliable outputs but also results that discriminate against individuals based on protected characteristics such as gender, race, or age.
Additional Risks Introduced by Agentic AI
The ICO's recent report highlights that agentic AI introduces further risks. These include:
- uncertainty about which organisation is responsible for data protection when multiple parties are involved
- agents accessing more personal data than necessary;
- agents drawing inferences about sensitive personal information; and
- new cybersecurity threats arising from the autonomous nature of these systems.