paint-brush
Copyright Issues in Github Copilotby@legalpdf
1,146 reads
1,146 reads

Copyright Issues in Github Copilot

by Legal PDF: Tech Court CasesSeptember 3rd, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This excerpt delves into the class allegations surrounding a lawsuit involving plaintiffs who share ownership of code published under a License. The claims revolve around GitHub's Copilot allegedly using their work without authorization, causing damages common to all class members.
featured image - Copyright Issues in Github Copilot
Legal PDF: Tech Court Cases HackerNoon profile picture

DOE vs. Github (amended complaint) Court Filing (Redacted), June 8, 2023 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 15 of 38.

VII. FACTUAL ALLEGATIONS

C. Copilot Outputs Copyrighted Materials Without Following the Terms of the Applicable Licenses

70. GitHub Copilot works in a similar way to OpenAI Codex. As mentioned above, a modified version of Codex is used as the engine that powers Copilot.


71. Copilot is installed by the end user as an extension to various code editors, including Microsoft’s Visual Studio and VS Code. As the user types into the editor, their code is uploaded in real time to Microsoft’s Azure cloud platform, where they become prompts for Copilot.


72. When we give Copilot the same prompt discussed above in Paragraph 52, “function isEven(n) {”, it interprets the prompt as the beginning of a function written in the JavaScript language that will test whether a number is even, same as Codex.


73. However, the Output of Copilot in response to the prompt is different than Codex, namely:


function isEven(n) {



74. This function is much closer to what a human programmer might use as compared to Codex’s older, inaccurate offering. It handles all values and types of “n” correctly. It does not cause a stack overflow for larger values of “n” like the Codex Output.


75. Copilot’s Output, like Codex’s, is derived from existing code. Namely, sample code that appears in the online book Mastering JS, written by Valeri Karpov.10 Like Codex’s Output, Copilot’s is also based upon copyrighted educational material. Mastering JS is a set of educational exercises for programmers. Like Eloquent JavaScript, there are many copies of Karpov’s exercise stored in public repositories on GitHub. Programmers working through Mastering JS store their answers there.


76. If Copilot is prompted with the name of a function that will test whether a number is prime (that is, a number that can only be evenly divided by 1 and itself ), namely “function isPrime(n) {”, it returns:


function isPrime(n) {



77. Though this function will work, it contains an error often made by beginner programmers that makes it much slower than it could be. Namely, the loop in the middle, which checks possible divisors, does not need to check every divisor smaller than “n,” only the divisors smaller than the square root of “n”. As with Codex, Copilot has no understanding of how the code works. It knows that more functions called “isPrime” contain the portion that checks for all divisors smaller than “n”, so that is what it offers. It does not return what it “thinks” is best, it returns what it has seen the most. It is not writing, it is reproducing (i.e., copying).


78. Like the other examples above—and most of Copilot’s Output—this output is nearly a verbatim copy of copyrighted code. In this case, it is substantially similar to the “isPrime” function in the book Think JavaScript by Matthew X. Curinga et al,[11] which is:


function isPrime(n) {


if (n < 2) {


return false;


}


for (let i = 2; i < n; i++) {


if (n % i === 0) {


return false;


}


}


return true;


}


79. As with the other examples above, the source of Copilot’s Output is a programming textbook. Also like the books the other examples were taken from, there are many copies of Curinga’s code stored in public repositories on GitHub where programmers who are working through Curinga’s book keep copies of their answers.


80. The material in Curinga’s book is made available under the GNU Free Documentation License. Although this is not one of the Suggested Licenses, it contains similar attribution provisions, namely that “You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License.”[12]


81. As with Codex, Copilot does not provide the end user any attribution of the original author of the code, nor anything about their license requirements. There is no way for the Copilot user to know that they must provide attribution, copyright notice, nor a copy of the license’s text. And with regard to the GNU Free Documentation License, Copilot users would not be aware that they are limited in what conditions they can place on the use of derivative works they make using this copyrighted code. Had the Copilot user found this code in a public GitHub repository or a copy of the book it was originally published in, they would find the GNU Free Documentation License at the same time and be aware of its terms. Copilot finds that code for the user but excises the license terms, copyright notice, and attribution. This practice allows its users to assume that the code can be used without restriction. It cannot.



Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 4:22-cv-06823-JST retrieved on August 26, 2023, from Storage Courtlistener is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.