paint-brush
The Times v. Microsoft/OpenAI: Unauthorized Reproductions of Times Works in GPT Models (11)by@legalpdf
152 reads

The Times v. Microsoft/OpenAI: Unauthorized Reproductions of Times Works in GPT Models (11)

by Legal PDFJanuary 2nd, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

As further evidence of being trained using unauthorized copies of Times Works, the GPT LLMs themselves have “memorized” copies of many of those same works encod
featured image - The Times v. Microsoft/OpenAI: Unauthorized Reproductions of Times Works in GPT Models (11)
Legal PDF HackerNoon profile picture

The New York Times Company v. Microsoft Corporation Court Filing December 27, 2023 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 11 of 27.

IV. FACTUAL ALLEGATIONS

C. Defendants’ Unauthorized Use and Copying of Times Content

2. Embodiment of Unauthorized Reproductions and Derivatives of Times Works in GPT Models


98. As further evidence of being trained using unauthorized copies of Times Works, the GPT LLMs themselves have “memorized” copies of many of those same works encoded into their parameters. As shown below and in Exhibit J, the current GPT-4 LLM will output near-verbatim copies of significant portions of Times Works when prompted to do so. Such memorized examples constitute unauthorized copies or derivative works of the Times Works used to train the model.


99. For example, in 2019, The Times published a Pulitzer-prize winning, five-part series on predatory lending in New York City’s taxi industry. The 18-month investigation included 600 interviews, more than 100 records requests, large-scale data analysis, and the review of thousands of pages of internal bank records and other documents, and ultimately led to criminal probes and the enactment of new laws to prevent future abuse. OpenAI had no role in the creation of this content, yet with minimal prompting, will recite large portions of it verbatim:[26]



Exhibit J at 5.


100. Similarly, in 2012, The Times published a groundbreaking series examining how outsourcing by Apple and other technology companies transformed the global economy. The series was the product of an enormous effort across three continents. Reporting this story was especially challenging because The Times was repeatedly denied both interviews and access. The Times contacted hundreds of current and former Apple executives, and ultimately secured information from more than six dozen Apple insiders. Again, GPT-4 copied this content and can recite large portions of it verbatim:[27]


Exhibit J at 3.


101. Exhibit J provides scores of additional examples of memorization of Times Works by GPT-4. Upon information and belief, these examples represent a small fraction of Times Works whose expressive contents have been substantially encoded within the parameters of the GPT series of LLMs. Each of those LLMs thus embodies many unauthorized copies or derivatives of Times Works.



Continue Reading Here.


[26] For original article, see Brian M. Rosenthal, As Thousands of Taxi Drivers Were Trapped in Loans, Top Officials Counted the Money, N.Y. TIMES (May 19, 2019), https://www.nytimes.com/2019/05/19/nyregion/taximedallions.html.


[27] For original article, see Charles Duhigg & Keith Bradsher, How the U.S. Lost Out on iPhone Work, N.Y. TIMES (Jan. 21, 2012), https://www.nytimes.com/2012/01/22/business/apple-america-and-a-squeezed-middleclass.html.




About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 1:23-cv-11195 retrieved on December 29, 2023, from nycto-assets.nytimes.com is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.