You don’t need a fancy PC to get started with data science and machine learning. In fact, you can run all your code in cloud-based notebooks without even worrying about setting up an environment locally.
Even if you’re new to data science, you’ve probably heard of Jupyter Notebooks. It has become the number 1 way to do data science. Jupyter Notebooks make it so much easier to run your code, write commentary and see your output all in 1 place. Almost all cloud platforms use some kind of Jupyter-like environment.
In this blog post, I am going to share 5 ways you can do data science in the cloud. Each of these platforms allows you to do this completely for free and they each work really well.
In my opinion, the 2 biggest upsides of using cloud platforms for data science are:
5 platforms will be covered in this post:
There are a few things you should note about how I have judged these platforms. I have actually tried each of these platforms myself and am giving my own opinions on what I think of them. My primary use of cloud platforms is to work on personal projects and not for company or enterprise use.
These are the criteria that I am using to compare these platforms:
Great, now that I’ve cleared all that up, let’s get into it.
DataCamp’s philosophy with the initial launch of Workspaces is that they want it to be as easy to do data science as it is to learn it. They have already created an incredible interactive environment for learning data science and Workspaces seem like a natural next step for anyone who now wants to easily apply their skills and start putting together a portfolio.
The Workspaces are completely free and you can choose between R or Python. So far, their notebook editor looks good and it is intuitive to use. There is also no admin involved in running or maintaining the workspace which makes it hassle-free.
To me,
Kaggle notebooks are free to use, with the option to choose between R and Python and they integrate well with other services. You can even connect your Kaggle notebook to Google Cloud Services to beef up the hardware if you need it, although this will come at an additional cost, of course. Kaggle does provide access to a GPU and for personal projects, it is usually more than enough.
Collaboration on Kaggle notebooks is limited. Similar to DataCamp, you can share your notebooks with other people (such as your teammates in a competition) but you effectively work on different versions of the notebook so there is no live collaboration feature.
Naturally, sharing on Colab notebooks is built-in. However, it does not seem to be capable of live collaboration (ie. with 2 or more people editing a notebook together in real-time). I find this to be disappointing since Google basically wrote the book on real-time collaboration with Docs and Sheets.
Colab is also not a particularly pretty app, especially when compared with some of the other platforms I’ll be covering on this list. However, since almost everyone interested in data science most likely has at least 1 google account, setup is by far the fastest.
It is free to use Colab but the resource is not guaranteed and there are several usage limits that change depending on demand. Your usage limits could even be different from mine if your code uses more resources.
I particularly like their publishing feature – you can publish your notebook as an article or as an interactive app or dashboard. I just love the presentation of the articles and dashboards. Your profile on Deepnote also acts as a portfolio and it is a great viewing experience for anyone looking through your work.
You can get up to 750 hours on their standard machines and each notebook comes with a nifty little feature that automatically shuts the machine down after 15 minutes of inactivity. This keeps the admin pretty low on this platform.
There is a bit of admin involved with this platform. On the free plan, you get 120 hours/month of basic machine time. However, once you open a notebook, so long as it is an open tab in your browser it will consume resources and eat into your available hours. Because of this, it is important to either close the tab or manually shut down the machine.
Also, if you’ve shared one of your notebooks with someone and they forget to close their browser window down at the end of the day, it’ll eat into your available quota. So that’s something to keep in mind.
Getting set up on the platform and launching a notebook can take a good couple of minutes since all that additional hardware that they offer needs to be provisioned. There is also limited functionality for publishing notebooks and building a portfolio of projects using their platform. They are
Gradient is the platform to go to if you need more resources and more compute for your project and you don’t want to pay a cloud provider like Google Cloud or AWS to get it.
Choosing which cloud platform to use depends on your own goals and needs.
If you’re looking for a place to build a portfolio and get involved in a large community while you’re still learning and improving your data science skills then I’d recommend Kaggle or DataCamp.
If you’re looking for a platform that’s got all the bells and whistles, allows you to build a professionally-looking portfolio, and offers real-time collaboration then I’d recommend Deepnote.
If you’ve started branching out into deep learning, NLP, and computer vision then I’d recommend giving Gradient a try.
See
Also Published Here