DOE vs. Github (amended complaint) Court Filing (Redacted), June 8, 2023, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 29 of 38.
183. Plaintiffs and the Class hereby repeat and incorporate by reference each preceding and succeeding paragraph as though fully set forth herein.
184. As described herein, Defendants have intentionally removed or altered CMI from Plaintiffs’ code in violation of Section 1202(b)(1) of the DMCA.
185. As described herein, Defendants have distributed copies of Plaintiffs’ code knowing that CMI has been removed or altered while knowing or having reasonable grounds to know that it will induce, enable, facilitate, or conceal infringement in violation of Section 1202(b)(3) of the DMCA.
186. Plaintiffs and members of the Class own the copyrights to Licensed Materials used to train Codex and Copilot. Copilot was trained on millions—possibly billions—of lines of code publicly available on GitHub. Copilot runs on Microsoft’s Azure cloud platform exclusively and Microsoft had input in the creation of Copilot. Microsoft is aware that Copilot ignores License Terms and that it was trained almost exclusively on Licensed Materials.
187. Plaintiffs and members of the Class included the following Copyright Management Information (as defined in Section 1202(c) of the DMCA) (“CMI”) in the Licensed Materials:
a. copyright notices;
b. the title and other information identifying the Licensed Materials;
c. the name of, and other identifying information about, the authors of the Licensed Materials;
d. the name of, and other identifying information about, the copyright owners of the Licensed Materials;
e. terms and conditions for use of the Licensed Materials, specifically the Suggested Licenses; and
f. identifying numbers or symbols referring to CMI or links to CMI.
188. Defendants did not contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA.
189. Defendants knew that they did not contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA.
190. As part of the scheme, Defendants did not attempt to contact Plaintiffs to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA. In fact, Defendants’ removal of CMI made it difficult or impossible to contact Plaintiffs and the Class to obtain authority to remove or alter CMI from the Licensed Materials within the meaning of the DMCA. Rather, Defendants removed or altered CMI from open-source code that is owned by Plaintiffs and the Class after the code was uploaded to a GitHub repository by incorporating it into Copilot with its CMI removed.
191. Without the authority of Plaintiffs and the Class, Defendants intentionally removed or altered CMI from the Licensed Materials after they were uploaded to one or more GitHub repositories.
192. Defendants had access to but were not licensed by Plaintiffs nor the Class to train any machine learning, AI, or other pseudo-intelligent computer program, algorithm, or other functional prediction engine using the Licensed Materials.
193. Defendants had access to but were not licensed by Plaintiffs nor the Class to incorporate the Licensed Materials into Copilot.
194. Defendants had access to but were not licensed by Plaintiffs nor the Class to create Derivative Works[34] based upon the Licensed Materials.
195. Defendants had access to but were not licensed by Plaintiffs nor the Class to distribute the Licensed Materials as they do through Copilot.
196. Without the authority of Plaintiffs and the Class, Defendants distributed CMI knowing that the CMI had been removed or altered without authority of the copyright owner or the law with respect to the Licensed Materials.
197. Defendants distributed copies of the Licensed Materials knowing and intending that CMI had been removed or altered without authority of the copyright owner or the law, with respect to the Licensed Materials.
198. Defendants removed or altered CMI from the Licensed Materials knowing and intending that it would induce, enable, facilitate, or conceal infringement of copyright.
199. Without the CMI associated with the Licensed Materials, Copilot users are induced or enabled to copy the Licensed Materials. Because CMI has been removed, Copilot users do not know whether Output is owned by someone else and subject to restrictions on use. Without the CMI, copyright infringement is facilitated or concealed, because Plaintiffs and the Class are prevented from knowing or learning that the Output is based upon one or more of the Licensed Materials. Use of the Licensed Materials is not infringement when the terms of the applicable Suggested License are followed. Had the CMI not been removed, Copilot users would be aware of the Licenses and their obligations under them. The terms of the applicable Suggested License would have allowed those users to use the Licensed Materials without infringement. By withholding and concealing license information and other CMI, Defendants prevented Copilot users from making non-infringing use of the Licensed Materials. This contradicts the express wishes of Plaintiffs and the Class, which are set forth explicitly in the Suggested Licenses under which the Licensed Materials are offered.
200. Defendants removed or altered CMI from Licensed Materials owned by Plaintiffs and the Class while possessing reasonable grounds to know that it would induce, enable, facilitate, and/or conceal infringement of copyright in violation of Sections 1202(b)(1) and 1202(b)(3) of the DMCA.
201. By omitting, altering and/or concealing CMI from Copilot’s Output, Defendants have reasonable grounds to know that innocent infringers are induced or enabled to copy the Licensed Materials, because CMI has been removed. Without the CMI, Defendants have reasonable grounds to know copyright infringement is facilitated or concealed, because Plaintiffs and the Class have the difficult or impossible task of proving the Licensed Materials belong to them.
202. The profits attributable to Defendants’ violation of the DMCA include the revenue from: Copilot subscription fees, sales of or subscriptions to Defendants’ Copilot-related products and/or services that are used to run Copilot, hosting Copilot on Azure, and any other of Defendants’ products that contain copies of the Licensed Materials without all the original CMI. The Licensed Materials add nearly all value to the Copilot product because the purpose of Copilot is to provide code and the source of that code is the Licensed Materials. Without the Licensed Materials, Copilot would not be functional.
203. On information and belief, Defendants could have trained Copilot to include attribution, copyright notices, and license terms when it provides Output covered by a License.
204. Defendants did not request or obtain permission from Plaintiffs and the Class to use the Licensed Materials for Defendants’ Copilot product.
205. Defendants use of the Licensed Materials does not follow the requirements of the Suggested Licenses associated with the Licensed Materials. In particular, Copilot fails to provide attribution for the creator nor the owner of the Work. Copilot fails to include the required copyright notice included in the License. Copilot fails to include the applicable Suggested License’s text.
206. Defendants are sophisticated with respect to intellectual property matters related to open-source code. Microsoft in particular has extensive experience granting licenses, obtaining licenses, and enforcing license terms. Its most recent Annual Report states:
We protect our intellectual property investments in a variety of ways. We work actively in the U.S. and internationally to ensure the enforcement of copyright, trademark, trade secret, and other protections that apply to our software and hardware products, services, business plans, and branding. We are a leader among technology companies in pursuing patents and currently have a portfolio of over 69,000 U.S. and international patents issued and over 19,000 pending worldwide. While we employ much of our internally-developed intellectual property exclusively in our products and services, we also engage in outbound licensing of specific patented technologies that are incorporated into licensees’ products. From time to time, we enter into broader cross-license agreements with other technology companies covering entire groups of patents. We may also purchase or license technology that we incorporate into our products and services. At times, we make select intellectual property broadly available at no or low cost to achieve a strategic objective, such as promoting industry standards, advancing interoperability, supporting societal and/or environmental efforts, or attracting and enabling our external development community. Our increasing engagement with open source software will also cause us to license our intellectual property rights broadly in certain situations.
Microsoft Corporation Annual Report, Form 10-K at 27 ( July 28, 2022) (emphasis added).[35]
207. GitHub, which offers the Copilot product jointly with OpenAI, also has extensive experience with the DMCA. GitHub knows or reasonably should know that the Licensed Materials it hosts are subject to copyright. It provides the language of the Suggested Licenses to users, all of which include copyright notices. Its 2022 Transparency Report—January to June[36] states: “Copyright-related takedowns (which we often refer to as DMCA takedowns) are particularly relevant to GitHub because so much of our users’ content is software code and can be eligible for copyright protection.”[37] In the first six months of 2022, GitHub processed 1220 DMCA takedown requests. Its DMCA Takedown Policy[38] notes “GitHub probably never would have existed without the DMCA.”
208. GitHub also knows or reasonably should know the portions of the DMCA giving rise to Plaintiffs’ claim. In its 2021 Transparency Report, “Before removing content based on alleged circumvention of copyright controls (under Section 1201 of the US DMCA or similar laws in other countries), we carefully review both the legal and technical claims, and we sponsor a Developer Defense Fund to provide developers with meaningful access to legal resources.”[39]
209. GitHub is aware that Copilot’s removal of CMI is illegal. For example, it states that “publishing or sharing tools that enable circumvention are not [permitted]”[40] and “Distributing tools that enable circumvention is prohibited, even if their use by developers falls under the exemption [for security research].”[41] GitHub has also frequently published articles discussing the DMCA, its application, and the Copyright Office’s guidance on its scope and exceptions.[42]
210. Unless Defendants are enjoined from violating the DMCA, Plaintiffs and the Class will suffer great and irreparable harm by depriving them of the right to identify and control the reproduction and/or distribution of their copyrighted works, to have the terms of their opensource licenses followed, and to pursue copyright-infringement remedies. Defendants will not be damaged if they are required to comply with the DMCA. Plaintiffs and the Class are therefore entitled to an injunction barring Defendants from violating the DMCA and impounding any device or product that is in the custody or control of Defendants and that the court has reasonable cause to believe was involved in a violation of the DMCA.
211. Plaintiffs and the Class are further entitled to recover from Defendants the actual or statutory damages Plaintiffs and the Class sustained pursuant to 17 U.S.C. § 1203(c) and for Plaintiffs’ and the Class’s costs and attorneys’ fees in enforcing the Licenses. Plaintiffs and the Class are also entitled to recover as restitution from Defendants for any unjust enrichment, including gains, profits, and advantages that Defendants have obtained as a result of their breach of the Licenses.
212. Defendants conspired together and acted jointly and in concert pursuant to their scheme to commit the acts that violated the DMCA alleged herein.
213. Defendants induced Copilot users to unknowingly violate the DMCA by withholding attribution, licensing, and other information as described herein
Continue Reading Here.
About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.
This court case 4:22-cv-06823-JST retrieved on August 26, 2023, from Storage Courtlistener is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.