paint-brush
How Could an AI Model's Study of Publicly Available Code Harm Programmers Using GitHub?by@legalpdf

How Could an AI Model's Study of Publicly Available Code Harm Programmers Using GitHub?

by Legal PDFSeptember 22nd, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Plaintiffs do not explain how tangible harm to them results from an AI model’s study of such code for training purposes.
featured image - How Could an AI Model's Study of Publicly Available Code Harm Programmers Using GitHub?
Legal PDF HackerNoon profile picture

Github Motion to dismiss Court Filing, retrieved on January 26, 2023 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This part is 7 of 26.

ARGUMENT

I. PLAINTIFFS LACK ARTICLE III STANDING AND THEREFORE SUBJECT MATTER JURISDICTION BECAUSE THEY HAVE NOT ALLEGED ACTUAL OR THREATENED INJURY.


A. Plaintiffs’ Lack-Of-Attribution Theory Is Insufficient To Confer Standing.


Both of the Plaintiffs allege that they are GitHub users who have published unspecified source code subject to unspecified open source licenses at an unspecified time. See Compl. ¶¶ 19-20. For all their references to “copyright interests” and a “Brave New World of Software Piracy,” neither of the Plaintiffs allege ownership or infringement of any copyrighted (let alone registered, see 17 U.S.C. § 411) work. They do not even identify a copyrighted work. Plaintiffs thus allege no invasion of their copyright interests—an allegation that would run headlong into the doctrine of fair use. See Google LLC v. Oracle Am., Inc., 141 S. Ct. 1183 (2021); Author’s Guild v. Google, Inc. 804 F.3d 202 (2d Cir. 2015); Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir. 2007); Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2003); Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992).


Nor do Plaintiffs identify any harm that has come from the bare use of the contents of public repositories to train Codex or Copilot. Plaintiffs admittedly chose to make their source code freely available for inspection by anyone. Plaintiffs assert no legal interest that would entitle them to restrict the study by human or machine of freely available code. Plaintiffs do not explain how tangible harm to them results from an AI model’s study of such code for training purposes. And the lack of such an allegation is no surprise, since the First Amendment and fair use generally protect such study. The open source principles embodied in common license agreements embrace learning and understanding from published code.


Plaintiffs instead conjure a theory of harm related not to copyright interests or training, but to lack of attribution. The apparent contention is that Copilot may generate suggested snippets of code that match snippets in existing GitHub projects, without providing credit. As explained below in connection with Microsoft and GitHub’s Rule 12(b)(6) motion, Plaintiffs fail to state a viable claim based on this theory. But even assuming such a claim is conceivable in the abstract, there is no factual allegation in the Complaint suggesting that these Plaintiffs have suffered injury under their snippet-without-credit theory. They allege a few examples of a Copilot suggestion matching someone else’s code. Compl. ¶¶ 66-76. They perform a spurious back-of-the-envelope calculation suggesting 12,000 users whose code might be matched by a Copilot suggestion. Compl. ¶ 91. Nothing in the Complaint, however, suggests that either of the Plaintiffs have been or will be among those users. That leaves no one in this case claiming that Copilot has actually caused them any harm at all—let alone pleading a legal claim based on that harm.


Plaintiffs also fail to allege facts supporting a reasonable inference that their code plausibly would become a Copilot Output. What code do they claim an interest in? What problem does that code solve? Where and how frequently has their published code been replicated by others such that Copilot would identify it as “the most likely solution to a given prompt,” Compl. ¶ 79? What prompt could potentially generate a match, and why is it likely that a user would enter this prompt? Without these types of allegations, Plaintiffs allege at best the sort of “conjectural or hypothetical” injury that cannot confer standing. Lujan, 504 U.S. at 650.


Indeed, Plaintiffs’ failure to identify themselves in violation of Rule 10 underscores the wholly speculative nature of the case. The “use of fictitious names runs afoul of the public’s common law right of access to judicial proceedings” and is counter to “Rule 10(a)’s command that the title of every complaint ‘include the names of all the parties.’” Does v. Advanced Textile Corp., 214 F.3d 1058, 1067 (9th Cir. 2000) (quoting Fed. R. Civ. P. 10(a)).* If what Plaintiffs truly sought was attribution for their code, one would expect them to identify themselves and that code. Instead, they offer only abstraction.



Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 4:22-cv-06823-JST retrieved on September 11, 2023, from documentcloud.org is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.