How and Why We Choose to Clone all Data on Githubby@debricked

How and Why We Choose to Clone all Data on Github

tldt arrow
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Debricked has achieved a not so small feat – we are now able to actively keep and maintain a clone of all data on GitHub. To understand all the why’s and how we have interviewed our Head of Data Science, Emil Wåréus. The short answer is – to have a better and faster representation of the data that we need to service our customers. In terms of cloning all the repositories we are looking at about 20 terabytes of data. There are about 10,000 – 30,000 pull requests each hour, 100,000 open source issues and 12,000 active users.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How and Why We Choose to Clone all Data on Github
Debricked HackerNoon profile picture

@debricked

Debricked

Solving the problem of vulnerabilities & compliance when using Open Source in product development


Receive Stories from @debricked

react to story with heart

RELATED STORIES

L O A D I N G
. . . comments & more!