paint-brush
The Legal Copyright Battle Against AI: An Introduction to the EU’s Requirementsby@futuristiclawyer
110 reads

The Legal Copyright Battle Against AI: An Introduction to the EU’s Requirements

by Futuristic LawyerMay 31st, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The Center for AI Safety released a one-sentence statement on existential risks of AI. The statement was signed by 350 prominent AI researchers and industry leaders. The tension between generative AI models and copyright law is imminent and the dangers are easy to understand and explain. In this post, I will provide some relevant background information on the EU’s upcoming AI Act.
featured image - The Legal Copyright Battle Against AI: An Introduction to the EU’s Requirements
Futuristic Lawyer HackerNoon profile picture


Yesterday, the non-profit organization Center for AI Safety released a one-sentence statement about the existential risks of AI:


Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war


The statement was signed by 350 prominent AI researchers and industry leaders such as OpenAI CEO, Sam Altman, Google DeepMind CEO, Demis Hassabis, and Anthropic CEO, Dario Amodei. Earlier this month, the same three tech leaders met with the Biden administration to discuss future AI regulations.  AI is one of the few industries where major companies ask the regulators to impose boundaries on what they can and cannot do, usually, it works the other way around.


The mid-to-long-term AI risks are serious, although difficult to understand and predict. In this light, new copyright infringement issues posed by gigantic generative AI models, may not seem like a big deal. There is certainly not any life-or-death urgency involved in clarifying the legal landscape. On the other hand, the tension between generative AI models and copyright law is imminent and the dangers are easy to understand and explain.


The Copyright Issues

Before Open AI’s commercial breakthrough with ChatGPT, and before the competitive race among BigTech companies to develop larger and more impressive AI models, legal scholars were discussing who would own the right to a piece of original work created by an AI. As it turns out, the question of authorship has had very little practical relevance.  Neither providers nor users of generative AI models have shown much interest in claiming rights over content generated by AI.


More relevant and pressing issues are:


  1. If using large volumes of copyright-protected material in the training process of large AI models is infringing the rights of copyright holders, and


2) What if outputs generated by AI models look suspiciously similar to copyrighted works used in the training process?


In my next post, I will attempt to answer both questions by looking at the AI image model Stable Diffusion which was recently targeted with lawsuits from two different fronts. In this post, I will provide some relevant background information on the EU’s upcoming AI Act and the debate about transparency requirements.



The EU AI Act and Foundation Models

The European Union agreed on a Compromise Text for its EU AI Act on the 11th of May. Among the changes from previous drafts comes additional regulation of so-called “foundation models”.

A  foundation model is defined as "an AI system model that is trained on broad data at scale, is designed for the generality of output, and can be adapted to a wide range of distinctive tasks". Examples of AI model that falls into this category is ChatGPT and Stable Diffusion.


The Compromise Text sets out requirements for developers of foundational models to register the model in an EU database. In this context, they have to disclose certain information and technical documentation of a range of factors related to the development of the model. According to Section C in ANNEX VIII, foundation model providers such as OpenAI and Google would be required to disclose data sources and which training resources are used in the development of their models.



EU: Compromise text for AI Act


In addition to the requirements in ANNEX VIII, providers of foundational models are among other things required to “document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law” (Article 28b (4) (c)).


This provision is interesting and I am curious to learn how it will be interpreted in practice. OpenAI has famously not revealed anything about how or with what data GPT-4 was trained. Once the EU's AI Act enters into force, this has to change if they wish to continue operating in the EU.


OpenAI, Upcoming Regulations & Copyright Issues

It seems like OpenAI is speaking in two tongues. On one hand, Sam Altman and his team support AI regulation and worry about the future implications of AI and its dangers to humanity. On the other hand, they have so far refused to open up GPT-4 for public scrutiny due to competitive concerns. OpenAI’s split-personality is foreseeable since the company has a stated mission “to ensure that artificial general intelligence benefits all of humanity”, while they are also riding on a ten-billion-dollar investment from Microsoft. The strange mixture of altruistic concerns for humanity and colossal funding from BigTech makes me wonder if OpenAI is willing to play ball with regulators, or if it will be more profitable for them to not do so and remain closed.


While the training process behind GPT-4 is closed for public scrutiny, we can be sure of one thing: it includes the use of copyright-protected material on an unfathomably large scale. Washington Post analyzed the Google C4 dataset which consists of more than 15 million domains and it still only accounts for a relatively small part of the size of GPT-3’s complete training data set. Presumably, the large majority of text GPT-3 used to analyze and learn from was protected by copyright.


I wonder if and how OpenAI and other providers of foundation models can prepare a “sufficiently detailed summary" of all this data in accordance with EU's AI Act. The interpretation is important.


Last Wednesday, Sam Altman floated the idea of leaving EU if OpenAI could not comply with the upcoming EU regulations. He has since reversed the statement in a Tweet from last Friday, where he clarified: "We are excited to continue to operate here and of course have no plans to leave. "


At a recent event in London during Sam Altman’s “Euro tour”, the OpenAI CEO said:


"The current draft of the EU AI Act would be over-regulating, but we have heard it's going to get pulled back"


Dragos Tudorache, an EU parliament member who is leading the drafting of EU proposals disagrees:


"I don't see any dilution happening anytime soon (..) These provisions relate mainly to transparency, which ensures the AI and the company building it are trustworthy. I don't see a reason why any company would shy away from transparency."