paint-brush
ChatGPT Never Produced "Wholesale Copies" of NYT's Articlesby@legalpdf
106 reads

ChatGPT Never Produced "Wholesale Copies" of NYT's Articles

tldt arrow

Too Long; Didn't Read

OpenAI wants portions of the NYT's lawsuit against the company be dismissed, arguing the paper presented misleading evidence to the court.
featured image - ChatGPT Never Produced "Wholesale Copies" of NYT's Articles
Legal PDF: Tech Court Cases HackerNoon profile picture

The New York Times Company v. OpenAI Update Court Filing, retrieved on February 26, 2024 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 13 of 15.

C. The DMCA Claim Fails for Multiple Independent Reasons

Count V is a claim for violation of Section 1202(b) of the Copyright Act, which prohibits the “[r]emoval or [a]lteration” of copyright management information or “CMI.” 17 U.S.C. § 1202(b). Congress passed that provision in the early days of the internet in recognition of the ease with which unauthorized copies of images and other works might proliferate in cyberspace. The provision encourages rightsholders to affix CMI to their works (and prohibits its removal) so that, if their works do proliferate on the internet, the public will be able to trace those works back to their owner. S. Rep. No. 105-190, at 16–17 (1998) (CMI intended to “track[] and monitor[]”). But Congress limited the statute with a “double-scienter requirement” that prevents its application when the CMI removal occurs as an unintended result of an “automatic [] process.” Zuma Press, Inc. v. Getty Images (US), Inc., 845 F. App’x 54, 57–58 (2d Cir. 2021). A typical CMI case might involve the surreptitious removal of a photograph’s “gutter credit” to conceal a failure to seek a license from the rightsholder. Mango v. BuzzFeed, Inc., 970 F.3d 167, 169–70, 173 (2d Cir. 2020).



1. The Times Did Not Specify the CMI at Issue


Count V should be dismissed at the outset for failure to specify the CMI at issue. The Complaint’s relevant paragraph fails to state what CMI is included in what work, and simply repeats the statutory text. Compl. ¶ 182 (alleging “one or more forms of [CMI]” and parroting language of Section 1202(c)).[36] The only firm allegation states that the Times placed “copyright notices” and “terms of service” links on “every page of its websites.” Compl. ¶ 125. But, at least for some articles, it did not.[37] And when it did, the information was not “conveyed in connection with” the works, 17 U.S.C. § 1202(c) (defining CMI), but hidden in small text at the bottom of the page.[38] Judge Orrick of the Northern District of California rejected similar allegations as deficient in another recent AI case. Andersen v. Stability AI Ltd., No. 23-cv-00201, 2023 WL 7132064, at *11 (N.D. Cal. Oct. 30, 2023) (must plead “exact type of CMI included in [each] work”).[39]


2. The Training-Based Section 1202 Claim Fails


The first Section 1202 violation alleged in the Complaint asserts that OpenAI “removed” CMI “in building the training datasets” in violation of Section 1202(b)(1) of the DMCA. Compl. ¶ 184. As a preliminary matter, to the extent this claim is based on the “building [of] training datasets” that occurred more than three years ago, it is time-barred. 17 U.S.C. § 507(b).


The Complaint also fails to plausibly allege that any CMI was removed. The Times advances three theories of removal: (1) removal of CMI when OpenAI allegedly “scraped” articles from the Times’s website; (2) removal of CMI “from third-party datasets,” i.e., Common Crawl; and (3) removal of CMI during “the training process,” which the Times alleges “does not preserve any [CMI]” “by design.” Compl. ¶¶ 184, 187. As a preliminary matter, each theory fails for failure to allege what CMI was removed or “not preserve[d],” which is particularly damning as the Times concedes that some CMI was preserved.[40] Judge Martínez-Olguín of the Northern District of California recently held an identical set of allegations to be insufficient. See Tremblay v. OpenAI, Inc., No. 3:23-cv-03416, 2024 WL 557720, at *4 (N.D. Cal. Feb. 12, 2024) (allegation that “training process does not preserve any CMI” “by design” was “conclusory”).


Each theory fails separately as well. The first theory fails because, while the Times states that OpenAI “scraped” its articles “directly from [its] websites,” none of the specific allegations actually suggest OpenAI designed its alleged “scrap[ing]” process to omit CMI. Compl. ¶ 184.[41] And the only allegations about OpenAI “scraping” articles from Times websites relate to the creation of WebText, which occurred over three years before this lawsuit. Supra Section IV.A.[42] The second theory fails because the Complaint lacks allegations about the inclusion (or exclusion) of the Times’s CMI in any “third-party datasets” like Common Crawl,[43] much less about OpenAI scrubbing any CMI from those datasets. First Nationwide Bank v. Gelt Funding Corp., 27 F.3d 763, 771 (2d Cir. 1994) (courts ignore “unwarranted deductions of fact”). And the third theory fails because there is no allegation in the Complaint supporting the conclusion that the “training process” excludes CMI “[b]y design.” Compl. ¶ 187; see also Tremblay, 2024 WL 557720, at *4.


Moreover, the Times fails to allege facts that could show how the alleged CMI removal could “induce, enable, facilitate, or conceal an infringement” of copyright—much less how OpenAI could have “reasonable grounds to know” it would. 17 U.S.C. § 1202(b). The “point of CMI” is to provide information to “the public,” not to govern purely internal databases. Roberts v. BroadwayHD LLC, 518 F. Supp. 3d 719, 737 (S.D.N.Y. 2021); Compl. ¶ 59. As Judge Martínez-Olguín explained, it is far from obvious how “the alleged removal of CMI in an internal database [could] enable infringement.” Tremblay, 2024 WL 557720, at *4 (dismissing claim).[44]


3. The Output-Based Section 1202 Claim Fails


The second category of Section 1202 violation in the Complaint alleges that (1) OpenAI violated Section 1202(b)(1)’s removal prohibition by failing to include the Times’s CMI in model outputs, Compl. ¶¶ 185–86; and (2) by displaying those outputs via ChatGPT or its API, OpenAI violated Section 1202(b)(3)’s prohibition on “distribut[ing]” works “knowing that [CMI] has been removed,” Compl. ¶ 189. Neither theory states a claim for relief.


As a preliminary matter, the Times’s Section 1202(b)(3) claim fails because the Complaint does not allege that OpenAI “distribute[d]” any outputs. In this context, “distribution” requires a “sale or transfer of ownership extending beyond that of a mere public display.” Wright v. Miah, No. 22-cv-4132, 2023 WL 6219435, at *7 (E.D.N.Y. Sept. 7, 2023) (emphasis added).[45] But “mere public display” of outputs is all the Complaint alleges. See, e.g., Compl. ¶ 102.


Regardless, this “output” theory fails because the outputs alleged in the Complaint are not wholesale copies of entire Times articles. They are, at best, reproductions of excerpts of those articles, some of which are little more than collections of scattered sentences. Supra 12. If the absence of CMI from such excerpts constituted a “removal” of that CMI, then DMCA liability would attach to any journalist who used a block quote in a book review without also including extensive information about the book’s publisher, terms and conditions, and original copyright notice. See supra note 22 (example of the Times including 200-word block quote in book review).


To avoid such anomalous results, courts have cabined applications of Section 1202(b)(1) and (3) to circumstances in which the works in question were “substantially or entirely reproduced.” Fischer v. Forrest, 286 F. Supp. 3d 590, 609 (S.D.N.Y. 2018). As such, failure to include original CMI in anything less than an identical reproduction of all (or almost all) of the work does not qualify as CMI removal. Tremblay, 2024 WL 557720, at *5 (dismissing claim because “Plaintiffs have not alleged that [OpenAI] distributed their books or copies of [them]”).[46] As the Times has not alleged that OpenAI reproduced entire articles, the output-based claim fails.


Even setting that aside, the Times’s output-based CMI claim fails for the independent reason that there was no CMI to remove from the relevant text. The Exhibit J outputs, for example, feature text from the middle of articles. Ex. J. at 2–126. As shown in the exhibit, the “Actual text from NYTimes” contains no information that could qualify as CMI. See, e.g., id. at 3; 17 U.S.C. § 1202(c) (defining CMI). So too for the ChatGPT outputs featured in the Complaint, which request the “first [and subsequent] paragraph[s]” from Times articles. See, e.g., Compl. ¶¶ 104, 106, 118, 121. None of those “paragraphs” contains any CMI that OpenAI could have “removed.”


4. The Times Fails to Allege a CMI-Based Injury


Count V separately fails for lack of standing. “[T]o have standing” to sue for a DMCA violation, the Times “must show that [it] was injured by that violation.” Steele v. Bongiovi, 784 F. Supp. 2d 94, 97–98 (D. Mass. 2011); see also 17 U.S.C. § 1203(a). Here, the Complaint’s “Harm to the Times” section relates entirely to its inability to receive speculative licensing revenue, see Compl. ¶¶ 155–56, and the possibility that ChatGPT will “divert readers,” see id. ¶ 157. Neither injury has any nexus to CMI. Nor is there any imaginable harm here: because all of the Complaint’s outputs were either generated using the original Times article itself, see Ex. J, or referenced the Times by name, see, e.g., Compl. ¶ 104, any user who encountered those outputs would have no doubt as to the provenance of the text and could easily find it on the Times’s website (as ChatGPT often invites them to do, see id. ¶¶ 106, 134). Cf. Kelly, 77 F. Supp. 2d at 1122 (DMCA claim failed because users who encounter images are “given the name of the Web site from which Defendant obtained the image, where any associated [CMI] would be available”).



Continue Reading Here.


[36] Design Pics Inc. v. PBH Network, Inc., No. 20-cv-1096, 2020 WL 8413512, at *4 (E.D.N.Y. Oct. 27, 2020) (dismissing CMI claim based on “conclusory, boilerplate parroting of the statutory text”).


[37] See, e.g., John Branch, Snow Fall: The Avalanche at Tunnel Creek, N.Y. Times, https://www.nytimes.com/projects/2012/snow-fall/index.html#/?part=tunnel-creek (last accessed Feb. 11, 2024); see also Compl. ¶¶ 104–05 & n.28 (citing this article).


[38] See, e.g., Pete Wells, As Not Seen on TV, N.Y. Times (Nov. 13, 2012), https://www.nytimes.com/2012/11/14/dining/reviews/restaurant-review-guys-american-kitchen-bar-in-timessquare.html; see also Compl. ¶¶ 106–07 & n.29 (citing and quoting this article).


[39] See also Wood v. Observer Holdings, LLC, No. 20-cv-07878, 2021 WL 2874100, at *8 (S.D.N.Y., July 8, 2021) (terms on “separate website” not CMI); GC2 Incorporated v. Int’l Game Tech. PLC, 255 F. Supp. 3d 812, 821–22 (N.D. Ill. 2017) (“terms of use notice near a copyrighted work” not “conveyed in connection with” work).


[40] See, e.g., Compl. ¶ 106 (ChatGPT responding to query naming “Pete Wells” with a completion correctly identifying the publication date of Wells’ article); see also id. ¶ 104 (same, for title).


[41] Kelly v. Arriba Soft Corp., 77 F. Supp. 2d 1116, 1122 (C.D. Cal. 1999), rev'd on other grounds by 336 F.3d 811 (9th Cir. 2003) (no Section 1202 liability vs. search engine “crawler [that] did not include [CMI]”).


[42] The Times suggests that OpenAI’s “Browse with Bing” feature “scrap[es] Times Works from The Times’s websites,” Compl. ¶ 185, but the Complaint does not include a single allegation supporting that conclusion, see supra 13 & n.32 (noting that Browse with Bing fetched content from third-party sites, not the Times’s website).


[43] OpenAI cannot have removed CMI from datasets that “contained no such [CMI]” in the first place. McGucken v. Shutterstock, Inc., No. 22-cv-00905, 2023 WL 6390530, at 11 (S.D.N.Y. Oct. 2, 2023) (rejecting DMCA claim).


[44] See Victor Elias Photography, LLC v. Ice Portal, Inc., 43 F.4th 1313, 1325 (11th Cir. 2022) (requiring “some identifiable connection between the defendant’s actions and the infringement or the likelihood of infringement.”).


[45] Id. at *10 (endorsing Section 1202(b)(3) claim where defendant distributed artwork on Etsy); MyPlayCity, Inc. v. Conduit Ltd., No. 10-cv-1615, 2012 WL 1107648, at *12 (S.D.N.Y. Mar. 30, 2012) (“distribution” means “actual dissemination of copies”); FurnitureDealer.Net, Inc v. Amazon.com, Inc, No. 18-cv-232, 2022 WL 891473, at *23 (D. Minn. Mar. 25, 2022) (“[P]ublic display does not constitute distribution, and thus is not a [DMCA] violation.”).


[46] See also Doe 1 v. GitHub, Inc., No. 22-cv-06823, 2024 WL 235217, *9 (N.D. Cal. Jan. 22, 2024) (dismissing Section 1202(b) claim against OpenAI because outputs were “not identical” to originals); A’Lor Int’l, Ltd. v. Tapper Fine Jewelry, Inc., No. 12-cv-02215, 2012 WL 12921035, at *10 (C.D. Cal. Aug. 8, 2012) (“the plain language of the statute encompasses only removal and alteration;” does not “include [mere] omissions”); Faulkner Press, L.L.C. v. Class Notes, L.L.C., 756 F. Supp. 2d 1352, 1358–59 (N.D. Fla. 2010) (rejecting claim where “word for word” text was “copied into a different form and [] incorporated into” commercial materials); Kelly, 77 F. Supp. 2d at 1121–22 (“displaying thumbnails of Plaintiffs’ images without [] the corresponding [CMI]” was not CMI “removal”).


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case retrieved on February 26, 2024, from fingfx.thomsonreuters.com is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.