paint-brush
Is There a 'GitHub For Data Scientists'?by@karthikbhandary2
362 reads
362 reads

Is There a 'GitHub For Data Scientists'?

by Karthik BhandaryMarch 17th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

What if I say that there is a place where you can not only store your Data Science projects but also experiment on them right then and there? If I was asked this question before getting to know DagsHub I would probably laugh 😂😅. But it is real! Let me cut to the chase. You can say that DagsHub is the GitHub for Data Scientists. The 'only difference' - in my honest opinion- is that DagsHub can do a lot more things than GitHub and Gitlab. One such thing is that we can cover the entire ML Life Cycle and we don’t even need any Dev Ops.

People Mentioned

Mention Thumbnail
featured image - Is There a 'GitHub For Data Scientists'?
Karthik Bhandary HackerNoon profile picture

Photo by Richard Horvath on Unsplash

What if I say that there is a place where you can --not only-- store your Data Science projects but also experiment on them right then and there?

If I was asked this question before getting to know DagsHub I would probably laugh 😂😅. But it is real!

Let me cut to the chase: You can say that DagsHub is the GitHub for Data Scientists. The 'only difference' - in my honest opinion- is that DagsHub can do a lot more things than GitHub and Gitlab.

One such thing is that we can cover the entire ML Life Cycle and we don’t even need any Dev Ops.

The layout and the design of the website are quite literally the same as GitHub, making it easy for us to start our project. It also works the same way along with the inclusion of other methods as well.

I am writing this article so that fellow data scientists or data science enthusiasts can get to know that this exists and they too can reap the benefits of this website.

We can perform experiments on our models and see the insights we get out of those experiments. We can make use of tools like MLflow, DVC, New Relic, Jenkins etc., which can be integrated into DagsHub.

They made it possible to organize things in an orderly fashion. For example, you have your notebooks in one place, your data in one place etc.,

As you can see above it is quite similar to GitHub. If you observe it has different tabs in the middle — All, Data, Models, Notebooks, DVC, Git. If you think about it, the organization of things are very effective.

The next thing I want to talk about is Collaboration. It's very effective.

Commenting on work, sharing with the team, reproducing models with desired results, comparing different experiments are all very efficient.

Take a look at the image below.

As I’ve said earlier that we can perform experiments. By that, I mean that we can try out different hypotheses on our models to see how they are working.

It is Open-Source

The main reason for starting this platform was to tackle the problem of collaborating. This is because the existing tools available are more inclined towards software development rather than Data Science.

It is kind of similar to GitLab in this aspect. By being open-source it makes the development process transparent, where everyone can contribute.

I will say that DagsHub has similarities with GitHub and GitLab. The open-source thing from GitLab and the efficient organization of the data from GitHub was put together to make DagsHub.

DagsHub uses open-source protocols, so it’s fully portable and extensible.

DagsHub allows you to quickly build, share and reuse machine learning and data science projects eliminating the hassle for teams to start every time from scratch. Following are the features of DagsHub that makes it stand out from other traditional platforms:

Inbuilt tools like Git for source code tracking, DVC for data version tracking, and MLflow for experiment tracking, which allows you to connect everything in one place with zero configuration.

It also supports data science tools & frameworks you already use.

We can track the experiments using the dashboard provided. We can even compare experiments and visualize. You can go from experiment to source with a single click.

CONCLUSION

Even though there is Github and Gitlab, I think, as a Data Scientist it is more beneficial to use DagsHub.

Not only does it store the data but also helps in making the project better by giving access to experimentation on the platform along with collaboration with the team.

I hope that you found this article helpful and interesting. Let me know your thoughts in the comment section.

Follow me on LinkedIn.

References

https://dagshub.com/

Also published here.