paint-brush
How and Why We Choose to Clone all Data on Githubby@debricked
256 reads

How and Why We Choose to Clone all Data on Github

by Debricked7mJanuary 20th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Debricked has achieved a not so small feat – we are now able to actively keep and maintain a clone of all data on GitHub. To understand all the why’s and how we have interviewed our Head of Data Science, Emil Wåréus. The short answer is – to have a better and faster representation of the data that we need to service our customers. In terms of cloning all the repositories we are looking at about 20 terabytes of data. There are about 10,000 – 30,000 pull requests each hour, 100,000 open source issues and 12,000 active users.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How and Why We Choose to Clone all Data on Github
Debricked HackerNoon profile picture
Debricked

Debricked

@debricked

Solving the problem of vulnerabilities & compliance when using Open Source in product development

L O A D I N G
. . . comments & more!

About Author

Debricked HackerNoon profile picture
Debricked@debricked
Solving the problem of vulnerabilities & compliance when using Open Source in product development

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite