Yesterday, the non-profit organization Center for AI Safety
“Mitigating the risk of extinction from A.I. should be a global priority alongside other societal-scale risks, such as pandemics and nuclear war”
The statement was signed by 350 prominent AI researchers and industry leaders such as OpenAI CEO, Sam Altman, Google DeepMind CEO, Demis Hassabis, and Anthropic CEO, Dario Amodei. Earlier this month, the same three tech leaders
The mid-to-long-term AI risks are serious, although difficult to understand and predict. In this light, new copyright infringement issues posed by gigantic generative AI models, may not seem like a big deal. There is certainly not any life-or-death urgency involved in clarifying the legal landscape. On the other hand, the tension between generative AI models and copyright law is imminent and the dangers are easy to understand and explain.
Before Open AI’s commercial breakthrough with ChatGPT, and before the competitive race among BigTech companies to develop larger and more impressive AI models, legal scholars were discussing who would own the right to a piece of original work created by an AI. As it turns out, the question of authorship has had very little practical relevance. Neither providers nor users of generative AI models have shown much interest in claiming rights over content generated by AI.
More relevant and pressing issues are:
2) What if outputs generated by AI models look suspiciously similar to copyrighted works used in the training process?
In my next post, I will attempt to answer both questions by looking at the AI image model Stable Diffusion which was recently targeted with lawsuits from two different fronts. In this post, I will provide some relevant background information on the EU’s upcoming AI Act and the debate about transparency requirements.
The European Union agreed on a
A foundation model is defined as "an AI system model that is trained on broad data at scale, is designed for the generality of output, and can be adapted to a wide range of distinctive tasks". Examples of AI model that falls into this category is ChatGPT and Stable Diffusion.
The Compromise Text sets out requirements for developers of foundational models to register the model in an EU database. In this context, they have to disclose certain information and technical documentation of a range of factors related to the development of the model. According to Section C in ANNEX VIII, foundation model providers such as OpenAI and Google would be required to disclose data sources and which training resources are used in the development of their models.
In addition to the requirements in ANNEX VIII, providers of foundational models are among other things required to “document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law” (Article 28b (4) (c)).
This provision is interesting and I am curious to learn how it will be interpreted in practice. OpenAI has famously not revealed anything about how or with what data GPT-4 was trained. Once the EU's AI Act enters into force, this has to change if they wish to continue operating in the EU.
It seems like OpenAI is speaking in two tongues. On one hand, Sam Altman and his team support AI regulation and worry about the future implications of AI and its dangers to humanity. On the other hand, they have so far refused to open up GPT-4 for public scrutiny due to competitive concerns. OpenAI’s split-personality is foreseeable since the company has a stated mission “to ensure that artificial general intelligence benefits all of humanity”, while they are also riding on a ten-billion-dollar investment from Microsoft. The strange mixture of altruistic concerns for humanity and colossal funding from BigTech makes me wonder if OpenAI is willing to play ball with regulators, or if it will be more profitable for them to not do so and remain closed.
While the training process behind GPT-4 is closed for public scrutiny, we can be sure of one thing: it includes the use of copyright-protected material on an unfathomably large scale.
I wonder if and how OpenAI and other providers of foundation models can prepare a “sufficiently detailed summary" of all this data in accordance with EU's AI Act. The interpretation is important.
Last Wednesday,
At a recent event in London during
"The current draft of the EU AI Act would be over-regulating, but we have heard it's going to get pulled back"
Dragos Tudorache, an EU parliament member who is leading the drafting of EU proposals disagrees:
"I don't see any dilution happening anytime soon (..) These provisions relate mainly to transparency, which ensures the AI and the company building it are trustworthy. I don't see a reason why any company would shy away from transparency."