January 9th 2020

For newbies, machine learning algorithms may seem too boring and complicated. Well, to some extent, this is true. In most cases, you stumble upon a few-page description for each algorithm and yes, it’s hard to find time and energy to deal with each and every detail. However, if you truly, madly, deeply want to be an ML-expert, you have to brush up your knowledge regarding it and there is no other way to be. But relax, today I will try to simplify this task and explain core principles of 10 most common algorithms in simple words (each includes a brief description, guides, and useful links). So, breath in, breath out, and let’s get started!

The mathematical representation of linear regression is a linear equation that combines a specific set of input data (x) to predict the output value (y) for that set of input values. The linear equation assigns a factor to each set of input values, which are called the coefficients represented by the Greek letter Beta (β).

KNN learns as it goes, in the sense, it does not need an explicit training phase and starts classifying the data points decided by a majority vote of its neighbours.

The object is assigned to the class which is most common among its k nearest neighbours.

In the SVM algorithm, a hyperplane is created which serves as a demarcation between the categories. When the SVM algorithm processes a new data point and depending on the side on which it appears it will be classified into one of the classes.

Random Forest comprises of decision trees which are graphs of decisions representing their course of action or statistical probability. These multiple trees are mapped to a single tree which is called Classification and Regression (CART) Model.

A simple example would be that given the data of football players, we will use K-means clustering and label them according to their similarity. Thus, these clusters could be based on the strikers preference to score on free kicks or successful tackles, even when the algorithm is not given pre-defined labels to start with.

K-means clustering would be beneficial to traders who feel that there might be similarities between different assets which cannot be seen on the surface.

For example, to check the probability that you will be late to the office, one would like to know if you face any traffic on the way.

However, Naive Bayes classifier algorithm assumes that two events are independent of each other and thus, this simplifies the calculations to a large extent. Initially thought of nothing more than an academic exercise, Naive Bayes has shown that it works remarkably well in the real world as well.

Naive Bayes algorithm can be used to find simple relationships between different parameters without having complete data.

A way to explain the advantage of RNN over a normal neural network is that we are supposed to process a word character by character. If the word is “trading”, a normal neural network node would forget the character “t” by the time it moves to “d” whereas a recurrent neural network will remember the character as it has its own memory.

So these were the top machine learning algorithms, that you should focus on in coming time.

References:

- https://blog.quantinsti.com/top-10-machine-learning-algorithms-beginners/
- https://towardsdatascience.com/top-10-machine-learning-algorithms-for-data-science-cdb0400a25f9
- https://blog.quantinsti.com/machine-learning-trading-predict-stock-prices-regression/
- https://blog.quantinsti.com/machine-learning-k-nearest-neighbors-knn-algorithm-python/
- https://blog.quantinsti.com/k-means-clustering-pair-selection-python/