paint-brush
New York Times Articles Are a 'Tiny Part' of ChatGPT's Training Databy@legalpdf

New York Times Articles Are a 'Tiny Part' of ChatGPT's Training Data

tldt arrow

Too Long; Didn't Read

OpenAI wants portions of the NYT's lawsuit against the company be dismissed, arguing the paper presented misleading evidence to the court.
featured image - New York Times Articles Are a 'Tiny Part' of ChatGPT's Training Data
Legal PDF: Tech Court Cases HackerNoon profile picture

The New York Times Company v. OpenAI Update Court Filing, retrieved on February 26, 2024 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 4 of 15.

C. Reliance on Longstanding Fair Use Principles

By July 2020, OpenAI had disclosed that Times articles were a tiny part of the diverse datasets that had been used to train these language models. And according to the Complaint, by the time GPT-3 was released in mid-2020, OpenAI had already established itself as a “commercial enterprise.” Compl. ¶ 57. The Times itself reported in 2020 that “OpenAI plans to sell access to GPT-3 via the internet, turning it into a widely used commercial product.” Metz, supra note 21. While its reporters joked they “might be put out to pasture by a machine,” Manjoo, supra note 20, the Times never accused OpenAI of violating copyright law. Instead, the Times enthusiastically and factually reported that the technology could be “enormously useful” and “open[] the door to a wide range of new possibilities.” Manjoo, supra note 20; Metz, supra note 21.


Indeed, it has long been clear that the non-consumptive use of copyrighted material (like large language model training) is protected by fair use—a doctrine as important to the Times itself as it is to the American technology industry.[22] Since Congress codified that doctrine in 1976, see H.R. Rep. No. 94-1476, at 65–66 (1976) (courts should “adapt” defense to “rapid technological change”), courts have used it to protect useful innovations like home video recording, internet search, book search tools, reuse of software APIs, and many others.[23]


These precedents reflect the foundational principle that copyright law exists to control the dissemination of works in the marketplace—not to grant authors “absolute control” over all uses of their works. Google Books, 804 F.3d at 212. Copyright is not a veto right over transformative technologies that leverage existing works internally—i.e., without disseminating them—to new and useful ends, thereby furthering copyright’s basic purpose without undercutting authors’ ability to sell their works in the marketplace. See supra note 23. And it is the “basic purpose” of fair use to “keep [the] copyright monopoly within [these] lawful bounds.” Oracle, 141 S. Ct. at 1198. OpenAI and scores of other developers invested billions of dollars, and the efforts of some of the world’s most capable minds, based on these clear and longstanding principles.



Continue Reading Here.


[22] See, e.g., Edmund White, In ‘The Talented Mr. Ripley,’ A Shape-Shifting Protagonist Who’s Up to No Good, N.Y. Times Style Magazine (Mar. 24, 2021), https://www.nytimes.com/2021/03/24/t-magazine/talented-mr-ripleypatricia-highsmith.html (including continuous 200-word excerpt from published novel).


[23] Sony Corp. of Am. v. Universal City Studios, Inc., 464 U.S. 417, 454–55 (1984); Kelly v. Arriba Soft Corp., 336 F.3d 811, 818–22 (9th Cir. 2003); Google Books, 804 F.3d at 209; Oracle, 141 S. Ct. at 1209 (2021).


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case retrieved on February 26, 2024, from fingfx.thomsonreuters.com is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.