You donāt need a fancy PC to get started with data science and machine learning. In fact, you can run all your code in cloud-based notebooks without even worrying about setting up an environment locally.
Even if youāre new to data science, youāve probably heard of Jupyter Notebooks. It has become the number 1 way to do data science. Jupyter Notebooks make it so much easier to run your code, write commentary and see your output all in 1 place. Almost all cloud platforms use some kind of Jupyter-like environment.
In this blog post, I am going to share 5 ways you can do data science in the cloud. Each of these platforms allows you to do this completely for free and they each work really well.
In my opinion, the 2 biggest upsides of using cloud platforms for data science are:
- Speed of set up ā You can get set up in just a few minutes and have almost everything you need to do machine learning available to you. You donāt need to go through the hassle of setting up an environment locally before you start writing code and analyzing data.
- Collaboration ā Being able to share your work and collaborate on projects is a big upside of any kind of cloud platform. However, collaboration is not available for all the platforms listed here. Even when it is offered, the degree of collaboration differs from platform to platform.
5 platforms will be covered in this post:
- Datacamp Workspaces
- Kaggle Notebooks
- Google Colab
- Deepnote
- Datalore by JetBrains
- Gradient Notebooks
There are a few things you should note about how I have judged these platforms. I have actually tried each of these platforms myself and am giving my own opinions on what I think of them. My primary use of cloud platforms is to work on personal projects and not for company or enterprise use.
These are the criteria that I am using to compare these platforms:
- Price ā they should be free or at least offer a decent free plan (not just a trial)
- Speed of set-up and low admin ā I shouldnāt have to ābabysitā my projects in case I go over āallowed hoursā. Iād like to be able to log on, work on a project and log off without worrying about whether I shut the server down.
- Aesthetic and intuitive ā the app should look good and it should be intuitive and easy to use
- Collaboration ā Iād like to be able to share my work with friends and be able to collaborate on them live. Iād also like to have the option to share securely if I want to, without my project being available publicly.
Great, now that Iāve cleared all that up, letās get into it.
DataCamp Workspaces
DataCampās philosophy with the initial launch of Workspaces is that they want it to be as easy to do data science as it is to learn it. They have already created an incredible interactive environment for learning data science and Workspaces seem like a natural next step for anyone who now wants to easily apply their skills and start putting together a portfolio.
The Workspaces are completely free and you can choose between R or Python. So far, their notebook editor looks good and it is intuitive to use. There is also no admin involved in running or maintaining the workspace which makes it hassle-free.
Kaggle Notebooks
To me,
Kaggle notebooks are free to use, with the option to choose between R and Python and they integrate well with other services. You can even connect your Kaggle notebook to Google Cloud Services to beef up the hardware if you need it, although this will come at an additional cost, of course. Kaggle does provide access to a GPU and for personal projects, it is usually more than enough.
Collaboration on Kaggle notebooks is limited. Similar to DataCamp, you can share your notebooks with other people (such as your teammates in a competition) but you effectively work on different versions of the notebook so there is no live collaboration feature.
Google Colab
Naturally, sharing on Colab notebooks is built-in. However, it does not seem to be capable of live collaboration (ie. with 2 or more people editing a notebook together in real-time). I find this to be disappointing since Google basically wrote the book on real-time collaboration with Docs and Sheets.
Colab is also not a particularly pretty app, especially when compared with some of the other platforms Iāll be covering on this list. However, since almost everyone interested in data science most likely has at least 1 google account, setup is by far the fastest.
It is free to use Colab but the resource is not guaranteed and there are several usage limits that change depending on demand. Your usage limits could even be different from mine if your code uses more resources.
Deepnote
I particularly like their publishing feature ā you can publish your notebook as an article or as an interactive app or dashboard. I just love the presentation of the articles and dashboards. Your profile on Deepnote also acts as a portfolio and it is a great viewing experience for anyone looking through your work.
You can get up to 750 hours on their standard machines and each notebook comes with a nifty little feature that automatically shuts the machine down after 15 minutes of inactivity. This keeps the admin pretty low on this platform.
Datalore
There is a bit of admin involved with this platform. On the free plan, you get 120 hours/month of basic machine time. However, once you open a notebook, so long as it is an open tab in your browser it will consume resources and eat into your available hours. Because of this, it is important to either close the tab or manually shut down the machine.
Also, if youāve shared one of your notebooks with someone and they forget to close their browser window down at the end of the day, itāll eat into your available quota. So thatās something to keep in mind.
Gradient Notebooks
Getting set up on the platform and launching a notebook can take a good couple of minutes since all that additional hardware that they offer needs to be provisioned. There is also limited functionality for publishing notebooks and building a portfolio of projects using their platform. They are
Gradient is the platform to go to if you need more resources and more compute for your project and you donāt want to pay a cloud provider like Google Cloud or AWS to get it.
Closing Thoughts
Choosing which cloud platform to use depends on your own goals and needs.
If youāre looking for a place to build a portfolio and get involved in a large community while youāre still learning and improving your data science skills then Iād recommend Kaggle or DataCamp.
If youāre looking for a platform thatās got all the bells and whistles, allows you to build a professionally-looking portfolio, and offers real-time collaboration then Iād recommend Deepnote.
If youāve started branching out into deep learning, NLP, and computer vision then Iād recommend giving Gradient a try.
See
Also Published Here