paint-brush
The Center for Investigative Reporting Claims OpenAI Exploited Its Work Without Permissionby@legalpdf
New Story

The Center for Investigative Reporting Claims OpenAI Exploited Its Work Without Permission

by Legal PDF: Tech Court CasesAugust 13th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

CIR, the oldest nonprofit newsroom in the U.S., has sued Microsoft and OpenAI for allegedly using its journalistic content without permission to train their AI models. The lawsuit claims that the defendants' AI products, including ChatGPT and Copilot, incorporated CIR's work without compensation, undermining CIR’s revenue and copyright protections. CIR seeks damages for the unauthorized use and violation of the DMCA.
featured image - The Center for Investigative Reporting Claims OpenAI Exploited Its Work Without Permission
Legal PDF: Tech Court Cases HackerNoon profile picture

The Center for Investigative Reporting Inc. v. OpenAI Court Filing, retrieved on June 27, 2024, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 1 of 18.


1. Plaintiff The Center for Investigative Reporting, Inc. (“CIR”), through its attorneys Loevy & Loevy, for its complaint against Defendants Microsoft Corporation (“Microsoft”) and OpenAI, Inc., OpenAI GP LLC, OpenAI LLC, OpenAI OpCo LLC, OpenAI Global LLC, OAI Corporation, LLC, OpenAI Holdings, LLC, (collectively “OpenAI” and, with Microsoft, “Defendants”) alleges the following:

NATURE OF THIS ACTION

2. Independent, nonprofit news reporting is a critical and unique voice in the United States media landscape. Founded in 1976, CIR is the oldest nonprofit newsroom in the country. CIR’s sole purpose is to benefit the public by reporting investigative stories about underrepresented voices in our democracy. For decades CIR has published valuable, one-of-akind, award-winning reporting that highlights diverse communities that are often overlooked. In just the last few months, CIR was awarded the George Polk Award, a Peabody Award, a Webby Award, and Robert F. Kennedy Human Rights Award for its unique reporting on diverse subjects, including prosecution of alleged sexual assault victims, abuse in the Mormon Church, and police procedures that injure families.


3. To sustain itself in today’s notoriously challenging media market, CIR has worked especially hard to survive while continuing to tell stories that are usually untold and left unseen. CIR has developed ways to gain revenue for its reporting, including license, advertising, and affiliate revenue, and has created partnership agreements and programs compatible with its mission to bring in new revenue. CIR has dedicated staff to develop streams of revenue to fund its reporting, including staff dedicated to licensing, advertising, revenue, and partnerships.


4. Defendants are companies responsible for the creation and development of the highly lucrative ChatGPT and Copilot artificial intelligence (AI) products, which are built on uncompensated and unauthorized use of the creative works of humans. According to the awardwinning website Copyleaks, nearly 60% of the responses provided by Defendants’ GPT-3.5 product contained some form of plagiarized content, and over 45% contained text that was identical to pre-existing content.


5. These systems, and the large language models (LLMs) that power them, are trained using human works. In particular, AI systems and LLMs ingest and use human-made journalism to attempt to mimic how humans write and speak in an effort to compete for the attention of consumers to generate profits. These training sets have included hundreds of thousands, if not millions, of works of journalism, including works created by CIR.


6. Defendants copied, used, abridged, and displayed CIR’s valuable content without CIR’s permission or authorization, and without any compensation to CIR. Defendants’ products undermine and damage CIR’s relationship with potential readers, consumers, and partners, and deprive CIR of subscription, licensing, advertising, and affiliate revenue, as well as donations from readers.


7. At the same time, Defendants greatly benefit from CIR’s distinct voice in the marketplace, as CIR provides a unique perspective, especially regarding investigative topics impacting diverse communities. If limited to a homogenous dataset, Defendants’ large language models would be stunted in growth and power. Their success depends on content creators like CIR and other members of the news media that are unique in their style and voice.


8. Protecting these unique voices is one of the fundamental purposes of copyright law. Since the founding of the United States, the Copyright Clause of the U.S. Constitution promises to “promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” The Copyright Act similarly empowers Congress to protect works of human creativity that persons have worked hard to create, encouraging people to devote substantial effort and resources to all manner of creative enterprises by providing confidence that creators’ works will be shielded from unauthorized encroachment and that creators will be properly compensated.


9. Further recognizing that emerging technologies could be used to evade statutory protections, Congress passed the Digital Millennium Copyright Act (DMCA) in 1998. The DMCA prohibits the removal of author, title, copyright, and terms of use information from protected works where there is reason to know that it would induce, enable, facilitate, or conceal a copyright infringement. Unlike copyright infringement claims, which require copyright owners to incur significant and often prohibitive registration costs as a prerequisite to enforcing their copyrights, a DMCA claim does not require registration.


10. When they populated their training sets with works of journalism, Defendants had a choice: to respect works of journalism, or not. Defendants chose the latter. They copied copyrighted works of journalism when assembling their training sets. Their LLMs memorized and at times regurgitated those works. They distributed those works and abridgements of them to each other and the public. They contributed to their users’ own unlawful copying. They removed the works’ copyright management information. They trained ChatGPT not to acknowledge or respect copyright. And they did this all without permission.


11. CIR brings this lawsuit seeking actual damages and Defendants’ profits, or statutory damages of no less than $750 per infringed work and $2,500 per DMCA violation.


Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case retrieved on June 27, 2024, motherjones.com is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.