paint-brush
DOE v. Github: Defendants Never Sought Permission to Use Others' Code to Train Codex or Copilotby@legalpdf
105 reads

DOE v. Github: Defendants Never Sought Permission to Use Others' Code to Train Codex or Copilot

by Legal PDF: Tech Court CasesSeptember 7th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Defendants did not attempt to contact Plaintiffs to obtain authority to remove or alter CMI from the Licensed Materials.

People Mentioned

Mention Thumbnail
featured image - DOE v. Github: Defendants Never Sought Permission to Use Others' Code to Train Codex or Copilot
Legal PDF: Tech Court Cases HackerNoon profile picture

DOE v. Github (original complaint) Court Filing, retrieved on November 3, 2022 is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 24 of 37.

VIII. CLAIMS FOR RELIEF

COUNT I

VIOLATION OF THE DIGITAL MILLENIUM COPYRIGHT ACT

17 U.S.C. §§ 1201–1205

(Direct, Vicarious, and Contributory)

(Against All Defendants)


138. Plaintiffs and the Class hereby repeat and incorporate by reference each preceding and succeeding paragraph as though fully set forth herein.


139. Plaintiffs and members of the Class own the copyrights to Licensed Materials used to train Codex and Copilot. Copilot was trained on millions—possibly billions—of lines of code publicly available on GitHub. Copilot runs on Microsoft’s Azure cloud platform exclusively and Microsoft had input in the creation of Copilot. Microsoft is aware that Copilot ignores License Terms and that it was trained almost exclusively on Licensed Materials.


140. Plaintiffs and members of the Class included the following Copyright Management Information (as defined in Section 1202(c) of the DMCA) (“CMI”) in the Licensed Materials:

a. copyright notices;


b. the title and other information identifying the Licensed Materials;


c. the name of, and other identifying information about, the authors of the Licensed Materials;


d. the name of, and other identifying information about, the copyright owners of the Licensed Materials;


e. terms and conditions for use of the Licensed Materials, specifically the Suggested Licenses; and


f. identifying numbers or symbols referring to CMI or links to CMI.


141. Defendants did not contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA.


142. Defendants knew that they did not contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA.


143. As part of the scheme, Defendants did not attempt to contact Plaintiffs to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA. In fact, Defendants’ removal of CMI made it difficult or impossible to contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA. Rather, Defendants removed or altered CMI from open-source code that is owned by Plaintiffs and the Class after the code was uploaded to a GitHub repository by incorporating it into Copilot with its CMI removed.


144. Without the authority of Plaintiffs and the Class, Defendants intentionally removed or altered CMI from the Licensed Materials after they were uploaded to one or more GitHub repositories.


145. Defendants had access to but were not licensed by Plaintiffs nor the Class to train any machine learning, AI, or other pseudo-intelligent computer program, algorithm, or other functional prediction engine using the Licensed Materials.


146. Defendants had access to but were not licensed by Plaintiffs nor the Class to incorporate the Licensed Materials into Copilot.


147. Defendants had access to but were not licensed by Plaintiffs nor the Class to create Derivative Works[32] based upon the Licensed Materials.


148. Defendants had access to but were not licensed by Plaintiffs nor the Class to distribute the Licensed Materials as they do through Copilot.


149. Without the authority of Plaintiffs and the Class, Defendants distributed CMI knowing that the CMI had been removed or altered without authority of the copyright owner or the law with respect to the Licensed Materials.


150. Defendants distributed copies of the Licensed Materials knowing and intending that CMI had been removed or altered without authority of the copyright owner or the law, with respect to the Licensed Materials.


151. Defendants removed or altered CMI from the Licensed Materials knowing and intending that it would induce, enable, facilitate, or conceal infringement of copyright.


152. Without the CMI associated with the Licensed Materials, Copilot users are induced or enabled to copy the Licensed Materials. Because CMI has been removed, Copilot users do not know whether Output is owned by someone else and subject to restrictions on use. Without the CMI, copyright infringement is facilitated or concealed, because Plaintiffs and the Class are prevented from knowing or learning that the Output is based upon one or more of the Licensed Materials. Use of the Licensed Materials is not infringement when the terms of the applicable Suggested License are followed. Had the CMI not been removed, Copilot users would be aware of the Licenses and their obligations under them. The terms of the applicable Suggested License would have allowed those users to use the Licensed Materials without infringement. By withholding and concealing license information and other CMI, Defendants prevented Copilot users from making non-infringing use of the Licensed Materials. This contradicts the express wishes of Plaintiffs and the Class, which are set forth explicitly in the Suggested Licenses under which the Licensed Materials are offered.


153. Defendants removed or altered CMI from Licensed Materials owned by Plaintiffs and the Class while possessing reasonable grounds to know that it would induce, enable, facilitate, and/or conceal infringement of copyright in violation of the DMCA. By omitting and concealing CMI from Copilot’s Output, Defendants have reasonable grounds to know that innocent infringers are induced or enabled to copy the Licensed Materials, because CMI has been removed. Without the CMI, Defendants have reasonable grounds to know copyright infringement is facilitated or concealed, because Plaintiffs and the Class have the difficult or impossible task of proving the Licensed Materials belong to them.


154. Defendants knowingly provided CMI that is false with respect to the Licensed Materials. Defendants have a business practice of asserting and/or implying that Copilot is the author of the Licensed Materials. Defendants knowingly distributed CMI that is false, with respect to the Licensed Materials. Defendants have a business practice of asserting and/or implying that Copilot is the author of the Licensed Materials.


155. Defendants provided or distributed false CMI from the Licensed Materials with respect to Copilot’s Output with the intent and foreseeable result to induce, enable, facilitate, or conceal infringement. Defendants have a business practice of asserting and/or implying that Copilot is the author of the Licensed Materials. This false CMI induces or enables Defendants or Copilot users to copy the Licensed Materials. Defendants’ false description of the source of Copilot’s Output facilitated or concealed infringement by Defendants and Copilot users because Plaintiffs and the Class have the difficult or impossible task of proving that the copyrights to the suggested portions of their Licensed Materials belong to them once those Licensed Materials have been delinked from all identifying information and all license terms governing their use.


156. The profits attributable to Defendants’ violation of the DMCA include the revenue from: Copilot subscription fees, sales of or subscriptions to Defendants’ Copilot-related products and/or services that are used to run Copilot, hosting Copilot on Azure, and any other of Defendants’ products that contain copies of the Licensed Materials without all the original CMI. The Licensed Materials add nearly all value to the Copilot product because the purpose of Copilot is to provide code and the source of that code is the Licensed Materials. Without the Licensed Materials, Copilot would not be functional.


157. On information and belief, Defendants could have trained Copilot to include attribution, copyright notices, and license terms when it provides Output covered by a License.


158. Defendants did not request or obtain permission from Plaintiffs and the Class to use the Licensed Materials for Defendants’ Copilot product.


159. Defendants use of the Licensed Materials does not follow the requirements of the Suggested Licenses associated with the Licensed Materials. In particular, Copilot fails to provide attribution for the creator nor the owner of the Work. Copilot fails to include the required copyright notice included in the License. Copilot fails to include the applicable Suggested License’s text.


160. Defendants are sophisticated with respect to intellectual property matters related to open-source code. Microsoft in particular has extensive experience granting licenses, obtaining licenses, and enforcing license terms. Its most recent Annual Report states:


We protect our intellectual property investments in a variety of

ways. We work actively in the U.S. and internationally to

ensure the enforcement of copyright, trademark, trade secret,

and other protections that apply to our software and hardware

products, services, business plans, and branding. We are a

leader among technology companies in pursuing patents and

currently have a portfolio of over 69,000 U.S. and international

patents issued and over 19,000 pending worldwide. While we

employ much of our internally-developed intellectual property

exclusively in our products and services, we also engage in

outbound licensing of specific patented technologies that are

incorporated into licensees’ products. From time to time, we enter

into broader cross-license agreements with other technology

companies covering entire groups of patents. We may also purchase

or license technology that we incorporate into our products and

services. At times, we make select intellectual property broadly

available at no or low cost to achieve a strategic objective, such as

promoting industry standards, advancing interoperability,

supporting societal and/or environmental efforts, or attracting and

enabling our external development community. Our increasing

engagement with open source software will also cause us to

license our intellectual property rights broadly in certain

situations.


Microsoft Corporation Annual Report, Form 10-K at 27 ( July 28, 2022) (emphasis added).[33]


161. GitHub, which offers the Copilot product jointly with OpenAI, also has extensive experience with the DMCA. GitHub knows or reasonably should know that the Licensed Materials it hosts are subject to copyright. It provides the language of the Suggested Licenses to users, all of which include copyright notices. Its 2022 Transparency Report—January to June[34] states: “Copyright-related takedowns (which we often refer to as DMCA takedowns) are particularly relevant to GitHub because so much of our users’ content is software code and can be eligible for copyright protection.”[35] In the first six months of 2022, GitHub processed 1220 DMCA takedown requests. Its DMCA Takedown Policy36 notes “GitHub probably never would have existed without the DMCA.”


162. GitHub also knows or reasonably should know the portions of the DMCA giving rise to Plaintiffs’ claim. In its 2021 Transparency Report, “Before removing content based on alleged circumvention of copyright controls (under Section 1201 of the US DMCA or similar laws in other countries), we carefully review both the legal and technical claims, and we sponsor a Developer Defense Fund to provide developers with meaningful access to legal resources.”[37]


163. GitHub is aware that Copilot’s removal of CMI is illegal. For example, it states that “publishing or sharing tools that enable circumvention are not [permitted]”[38] and “Distributing tools that enable circumvention is prohibited, even if their use by developers falls under the exemption [for security research].”[39] GitHub has also frequently published articles discussing the DMCA, its application, and the Copyright Office’s guidance on its scope and exceptions.[40]


164. Unless Defendants are enjoined from violating the DMCA, Plaintiffs and the Class will suffer great and irreparable harm by depriving them of the right to identify and control the reproduction and/or distribution of their copyrighted works, to have the terms of their opensource licenses followed, and to pursue copyright-infringement remedies. Defendants will not be damaged if they are required to comply with the DMCA. Plaintiffs and the Class members are therefore entitled to an injunction barring Defendants from violating the DMCA and impounding any device or product that is in the custody or control of Defendants and that the court has reasonable cause to believe was involved in a violation of the DMCA.


165. Plaintiffs and the Class are further entitled to recover from Defendants the actual or statutory damages Plaintiffs and the Class sustained pursuant to 17 U.S.C. § 1203(c) and for Plaintiffs’ and the Class’s costs and attorneys’ fees in enforcing the Licenses. Plaintiffs and the Class are also entitled to recover as restitution from Defendants for any unjust enrichment, including gains, profits, and advantages that Defendants have obtained as a result of their breach of the Licenses.


166. Defendants conspired together and acted jointly and in concert pursuant to their scheme to commit the acts that violated the DMCA alleged herein.


167. Defendants induced Copilot users to unknowingly violate the DMCA by withholding attribution, licensing, and other information as described herein.




32 “Derivative Works” as used herein refers to Copilot’s Output to the extent they are derived from Licensed Materials. The definition also includes the Copilot product itself, which is a Derivative Work based upon a large corpus of Licensed Materials.


[33] https://microsoft.gcs-web.com/static-files/07cf3c30-cfc3-4567-b20f-f4b0f0bd5087/.


[34] https://github.blog/2022-08-16-2022-transparency-report-january-to-june/.


[35] https://github.blog/2022-08-16-2022-transparency-report-january-to-june/.


[36] https://docs.github.com/en/site-policy/content-removal-policies/dmca-takedownpolicy#what-is-the-dmca/.


[37] https://github.blog/2022-01-27-2021-transparency-report/.


[38] https://github.blog/2020-11-19-take-action-dmca-anti-circumvention-and-developerinnovation/#what-dmca-exemptions-do-not-do/.


[39] https://github.blog/2021-11-23-copyright-office-expands-security-research-rights/.

40 See, e.g., Footnotes 34–39.



Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.


This court case 3:22-cv-06823-KAW retrieved on September 5, 2023, from Storage.Courtlistener is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.