paint-brush
Coders Who Brought GitHub Lawsuit Believe Their Code Was Used to Train Copilot — Without Evidenceby@legalpdf
127 reads

Coders Who Brought GitHub Lawsuit Believe Their Code Was Used to Train Copilot — Without Evidence

by Legal PDF: Tech Court CasesSeptember 22nd, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The Complaint also is not clear on whether or to what extent the training of Copilot forms the basis of the Plaintiffs’ claims for relief.

People Mentioned

Mention Thumbnail
featured image - Coders Who Brought GitHub Lawsuit Believe Their Code Was Used to Train Copilot — Without Evidence
Legal PDF: Tech Court Cases HackerNoon profile picture

Github Motion to dismiss Court Filing, retrieved on January 26, 2023 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 5 of 26.

ALLEGATIONS OF THE OPERATIVE COMPLAINT

C. Plaintiffs Sue Based On An Attribution Theory.


Plaintiffs are two anonymous GitHub users. Compl. ¶¶ 19-20. They both claim to have (at an unspecified time) “published Licensed Materials they owned a copyright interest in to at least one GitHub repository under one of the Suggested licenses.” Compl. ¶¶ 19-20. But they do not allege, either expressly or on information and belief, that their Licensed Materials were used to train Codex or Copilot. The most charitable reading of the Complaint is that because their Licensed Materials were in public repositories, and since public repositories were used to train Codex and Copilot, Compl. ¶ 82, they believe their Licensed Materials were used to do so. The Complaint also is not clear on whether or to what extent the training of Copilot forms the basis of the Plaintiffs’ claims for relief.


The crux of the Complaint, instead, is Copilot’s suggestions, which Plaintiffs term “Output.” Plaintiffs allege that these suggestions may sometimes match snippets of code used to train Copilot, but without providing information like authorship or licensing status. According to Plaintiffs, Copilot’s “Output is often a near-identical reproduction of code from the training data,” Compl. ¶ 46, and Copilot “has not been trained to provide Attribution.” Compl. ¶ 56. On this basis, Plaintiffs allege that “Defendants stripped Plaintiffs’ and the Class’s attribution, copyright notice, and license terms from their code.” Compl. ¶¶ 10-11, 78-81. Beyond that conclusory allegation, however, Plaintiffs do not connect their own code to any such Outputs. They identify no Output that has matched any of Plaintiffs’ Licensed Materials. They identify no prompt that might produce such a match. They allege only a few examples of code Outputs matching other authors’ code that have nothing to do with Plaintiffs. Compl. ¶¶ 48-81. And, while they assert that Copilot’s suggestions will “often” match existing code, they point only to a study suggesting that “about 1% of the time, a suggestion … may contain some code snippets longer than ~150 characters that matches” some preexisting code. Compl. ¶ 90. Plaintiffs’ allegations concerning Copilot do nothing to connect their own code to an Output.


The Complaint nominally advances a dozen claims for relief related to Codex and Copilot. The first claim encompasses multiple types of alleged violations and both direct and secondary theories of liability. Compl. ¶¶ 142-71 (Count I). Other claims specify a remedy with multiple legal theories, or otherwise rely on multiple theories. Compl. ¶¶ 204-10 (Count VI); id. ¶¶ 211- 14 (Count VII). Plaintiffs also tack on a theory of “civil conspiracy” covering most of the claims. Compl. ¶¶ 240-44.


But this long and complex Complaint speaks loudest with what it doesn’t say. Plaintiffs fail to identify any of the “Licensed Materials” they allegedly placed in a GitHub public repository that reflect purported “copyright interests,” or to tell us anything at all about those materials. The Complaint nowhere identifies any copyrighted work owned by either of the Plaintiffs, or any registration of such work. The Complaint fails to identify any use of their Licensed Materials. Although the case is supposedly about “software piracy on an unprecedented scale,” Plaintiffs make no copyright infringement claim. And, Plaintiffs identify no personal identifying information that they stored in their public repositories on GitHub, or say how it was allegedly exposed by Codex or Copilot.



Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 4:22-cv-06823-JST retrieved on September 11, 2023, from documentcloud.org is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.