paint-brush
GitHub Arctic Code Vault: Overviewby@S.Rattra
975 reads
975 reads

GitHub Arctic Code Vault: Overview

by Shubham RattraSeptember 3rd, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

GitHub Arctic Code Vault is a data repository preserved in the Arctic World Archive, a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. The project began on February 2, when the firm took a snapshot of all of the firm’s active public repositories to store them in the vault. The archive is located in a decommissioned coal mine in the Svalbard archipelago, closer to the North Pole than the Arctic Circle.

Company Mentioned

Mention Thumbnail
featured image - GitHub Arctic Code Vault: Overview
Shubham Rattra HackerNoon profile picture
Are you an Arctic Code Vault Contributor or have seen someone posting about it and don't know what it is. So let's take a look at what is an Arctic Code Vault Contributor and who are the ones who gets this batch.

GitHub, the world’s largest open-source platform for software and programs has safely locked the data of huge value and magnitude in a coal mine in Longyearbyen’s Norwegian town in the Arctic region.

Back in November 2019, GitHub Arctic Code Vault was first announced.

The GitHub Arctic Code Vault is a data repository preserved in the Arctic
World Archive (AWA), a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. The archive is located in a decommissioned coal mine in the Svalbard archipelago, closer to the North Pole than the Arctic Circle.

Last year, GitHub said that it plans to capture a snapshot of every active
public repository on 02/02/2020 and preserve that data in the Arctic
Code Vault.

The project began on February 2, when the firm took a snapshot of all of
GitHub’s active public repositories to store them in the vault. They initially intended to travel to Norway and personally escort the world’s open-source technology to the Arctic but their plans were derailed by the global pandemic. Then, they had to wait until 8 Julyfor the Arctic Data Vault data to be deposited.

GitHub announced that the code was successfully deposited in the Arctic Code Vault on July 8, 2020. Over the past several months, GitHub worked
with its archive partners Piql to write the 21TB of GitHub repository data to 186 reels of piqlFilm (digital photosensitive archival film).

GitHub's strategic software director, Julia Metcalf, has written a blog post
on the company’s website notifying the completion of GitHub's Archive Program on July 8th. Discussing the objective of the Archive Program, Metcalf wrote “Our mission is to preserve open-source software for future generations by storing your code in an archive built to last a thousand years.”

The Arctic Code Vault is only a small part of the wider GitHub Archive
Program, however, which sees the company partner with the Long Now
Foundation, Internet Archive, Software Heritage Foundation, Microsoft
Research and others.


How the cold storage will last 1,000 years?

Svalbard has been regulated by the international Svalbard Treaty as a demilitarized zone. Home to the world’s northernmost town, it is one of the most remote and geopolitically stable human habitations on Earth.

The AWA is a joint initiative between Norwegian state-owned mining company Store Norske Spitsbergen Kulkompani (SNSK) and very-long-term digital preservation provider Piql AS. AWA is devoted to archival storage in perpetuity. The film reels will be stored in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard. The AWA already preserves historical and cultural data from Italy, Brazil, Norway, the Vatican, and many others.

What’s in the 02/02/2020 snapshot?

The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will
sweep up every active public GitHub repository, in addition to significant dormant repos.

The snapshot will include every repo with any commits between the announcement at GitHub Universe on November 13th and 02/02/2020,
every repo with at least 1 star and any commits from the year before the snapshot (02/03/2019 – 02/02/2020), and every repo with at least 250 stars.

The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size—depending on available space, repos with more stars may retain binaries. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded and compressed. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.

The company further shared that every reel of the archive includes a copy
of the “Guide to the GitHub Code Vault” in five languages, written with input from GitHub’s community and available at the Archive Program’s own GitHub repository.

The archive will also include human-readable reel which documents the
technical history and cultural context of the archive’s contents, which the company calls as the Tech Tree. It will primarily consist of the existing works, selected to provide a detailed understanding of modern computing, open-source and its applications, modern software development, popular programming languages, etc.

What is the reason for doing this?

This project aims to preserve open-source software for future generations
by storing it in an archive built to last a thousand years.

They hope that one day, the open-source data can be used by historians or
future civilizations to understand the dawn of computing: the present.

In addition to the repositories, GitHub also saved a few classic works of humanity and an introductory letter in case it’s discovered after an apocalypse, or by aliens, or by something that doesn’t know much about present humanity. “This archive, the GitHub Code Vault, was established by the GitHub Archive Program, whose mission is to preserve open-source software for future generations”.

Who gets this batch?

The snapshot included any public repository that had at least 250 stars, that had at least one star and had been updated in the past year, or that had no stars but had been updated in the previous eighty days. If you’ve ever uploaded to GitHub, you probably had got your name and a creation stored in the arctic. Clicking on the Arctic Code Vault Contributorbadge in the highlights sectionof a profile will reveal which of a user’s projects were saved in this snapshot.

GitHub created the Arctic Code Vault Badge to honor the millions of developers worldwide who contributed to the open-source project. This
badge is displayed in the highlights section of the developer’s GitHub profile.

So if you have the Arctic Code Vault Contributorbadge then congratulations your code or project will be safe for 1000years at least and hopefully, someone in those times would find it useful.

Have a look at this video and see where your code/project is stored and how they are stored