As a beginner, jumping into a new machine learning project can be overwhelming. The whole process starts with picking a data set, and second of all, study the data set in order to find out which machine learning algorithm class or type will fit best on the set of data.
Here are some tips from experts on how to get started:
Ok, now we are packed with a couple of general tips to get started on your ML project, let’s take a look at 10 interesting examples that will teach you how to use ML algorithms, tune them, but also how to analyze the given data.
The Iris Flowers dataset is seen as the “Hello World” of ML as it’s the classic example of classification. This dataset offers a great introduction as it requires you to learn how to explore data and how to load it. The benefit of this dataset is that is small to load into your memory (150 rows) and it has only four properties: Petal length, Petal width, Sepal length, and Sepal width.
The project involves the identification of four different species of Iris flowers using the four known properties. The dataset allows you to use a supervised learning algorithm as the data is labeled whereas unsupervised means that we are looking for hidden structures in the data as the data is unlabeled.
Classification Type? We are using Multiclass Classification here. This means that we should be able to predict accurately to which class a data point belongs.
Goal: Classify flowers among three species based on the properties of the flower: dimensions of petals and sepals.
Machine Learning has been a trending topic for years now but many popular services are inaccessible for most developers primarily because of cost. A group called GNY is solving that with a decentralize their powerful machine learning platform that will be free to download and install. The machine learning platform is actually embedded within a blockchain so a user’s data is protected from potential hacks.
The team has released a demo that shows how this platform can predict groups of retail transactions through their powerful neural net, and a fully downloadable and customizable version of the platform is launching this Summer. GNY will have a library of selectable machine learning code sets that can be selected depending on the requirements of each individual and can be applied to their sidechain (as GNY will use Lisk’s sidechain technology).
Why is this so important? Almost all businesses are looking for an affordable way to unlock hidden value in their data, but not if it exposes them to security risks. The inherent structure of a blockchain helps to control data consistency and allow you to remain in control over your data.
Performance increases as the validation can already be started for the subsequent block while the previous block is still active. Validation includes checking if the user has sufficient balance. Only for the wrongly predicted transactions, this work needs to be redone.
This demo is a fun starter project for people who want to predict simple numbers and the full platform launching this Summer should provide developers with much more power and customization. A good data set can be found at MLWave for predicting repeat buyers using purchase history.
Goal: Predict future transactions based on spending history.
One interesting application of machine learning is sentiment analysis. Sentiment analysis has seen a major breakthrough with the rise of cryptocurrencies. Many have tried to build trading bots that incorporate sentiment analysis to make better trading decisions.
There are many other platforms that can be used for sentiment analysis like Reddit, Facebook, or LinkedIn as they all offer easy-to-use APIs for retrieving data. However, due to the consistent format of the data on the Twitter platform, this is the preferred data for machine learning. It is also much easier to pre-process as the tweets mainly consist of text, URLs, and hashtags.
The Twitter API knows many API libraries that can be used for integrating into your project. The wrapper for Python can be installed via pip with !pip install python-twitter . However, watch out when using the API as excessive usage can get you blacklisted. Therefore, Twitter provides guidelines on how to avoid being rate limited. If you require real-time data, the Twitter streaming API can save you.
A couple of fun examples to analyze:
Goal: A sentiment analyzer learns the various sentiments behind a piece of content. This task helps you think about designing various models to label a tweet as positive or negative. In a later phase, we can label tweets in a more nuanced way like ‘neutral’, ‘angry’, ‘optimistic’, …
Github Overview: of all Twitter-related data sets.
Recommender systems are one of the most successful and widespread applications of machine learning technologies in business. You find recommender systems everywhere in your daily life. For example, when watching Youtube videos, the Youtube algorithm will propose you recommended videos based on your watching habits but also key insights they gained on watching patterns from running ML algorithms on the watching behavior of people all across the world.
We can find two types of algorithms for recommender systems:
Currently, Movielens provides one of the most popular data sets for movie ratings which is an ideal dataset for beginners to experiment with.
Goal: Predict which movies users will like based on their ratings.
Tutorial: Towardsdatascience provides a tutorial for building a simple Recommender System in Python.
Stock prices predictor is a system that learns about the performance of a company and predicts future stock prices. The tricky thing with stock price predictions is that many types and sources of data can be used:
The benefit of analyzing the stock market is that it has shorter feedback cycles which makes it easier to validate your predictions. If you don’t know market cycles, I suggest to read up about this topic to understand how a typical cycle looks like.
To start off easy, you can pick up a simple machine learning example where we predict the 6-month price movement based on fundamental indicators from an organization his quarterly report.
Goal: Predict future price using fundamental and technical indicators.
Download: Stock market datasets from Quandl.com or Quantoplan.com.
Level up your reading game by joining Hacker Noon now!