1,238 reads

What is Machine Learning?

by Angshuman GuhaAugust 2nd, 2017

Too Long; Didn't Read

Following are four ML technologies that are currently prominent, in decreasing order of success in real world applications. In principle, any ML technology can be applied to any part of the taxonomy above.

featured image - What is Machine Learning?

a very short guide for non-technical people

Taxonomy

Supervised learning: A model predicts y from x by learning from examples of (x, y) pairs. This kind of ML falls into two broad categories: classification (each y is one out of many classes, example, identity of a person on a photo) and regression (each y is a number, example, risk in a credit application).
Unsupervised learning: There is no y, only x. Examples of this kind of ML include [a] clustering of data samples, [b] anomaly detection in a collection of data samples, [c] forecasting from a time series data sequence, [d] learning latent representations, example, representing English words as vectors in a high dimensional space such that related words are close to each other. [You can argue there is a ‘y’ in forecasting, but it’s implicit, kind of.]
Reinforcement learning: There is a y, but it is in the form of a reward or punishment. Typically the data arrives as a dynamic stream. You interact with a system and get rewarded or punished for your actions. Example, an autonomous car learning to drive.

Neural Nets

These models are inspired by connections of neurons in the brain. They are like cartoons of the real thing. Since their invention in 1950s, the technology has gone through several winters in research and funding. Currently, it’s spring. The basic building block of neural nets is an artificial neuron that computes a weighted sum of its inputs and applies a nonlinear activation function to that sum to produce its output. These neurons are connected to each other in a network, typically in a layered topology, without a feedback loop. The basic method of training these nets is backpropagation. Convolutional neural nets are one particularly potent variety. If you add feedback within neural networks, you get recurrent nets, of which LSTM nets have been very successful recently. Deep Learning is a somewhat ill-defined term applied to a some spectacularly successful neural net applications. To most people, it refers to a large network with a high number of layers. I think deep learning also refers to other nuanced changes to the practice of neural network modeling, including an aversion to feature engineering and an affinity to automatically learning (hierarchical) feature abstractions.

Decision Trees

A decision tree is a computer science version of the game of 20 Questions. You make each decision based on the value of one attribute (example: “does it have four legs?” decides whether “it” can be a bird). The decisions are branches in a tree, where each successive choice of attribute depends on previous decisions. CART and ID3 are two of the earliest algorithms for learning decision trees, both from the 1970s. You can gather multiple decision trees together into forests, where predictions from individual trees can be aggregated, for example, by voting. One such ensemble is a Random Forest. There are two main methods of assembling forests from trees: boosting and bagging, but both these techniques apply to ensembles of non-tree ML models as well.

Linear Models

The artificial neurons in neural nets are linear models. A linear model produces an output that is a linear combination (weighted sum) of the inputs, possibly followed by a transformation (activation) function. Compared to full-fledged neural network models, they are extremely simple. However, sometimes a linear model is the only practical option when faced with very large sparse problems. Such problems may have millions of dimensions (variables), most of which are absent in any specific example. Google’s 24/7 data pipeline for predicting clicks and conversions for ads employed such as a system (as of 2009). Linear models can be quite effective when combined with appropriate training and optimization techniques. It continues to be a ripe field of research.

Support Vector Machines

The idea behind SVMs is beautifully explained here (unfortunately, you will have to load each image). If you had balls of two colors on a table and you wanted to separate the red from the blue using a stick, an SVM will try to put the stick in the best possible place by having as big a gap on either side of the stick as possible. However, there may not be any position of stick that separates the colors (the problem may not be linearly separable). In that case you flip the table, throwing the balls in the air. With the balls magically suspended in space at different heights, you may be able to position a sheet of paper separating the colors. The flip of the table is a mathematical transformation using a kernel. One problem with SVMs is that learning is slow, if not impractical , for large amounts of data.

image source: https://xkcd.com/1725/

Other ML

There are many other ML technologies that are not described here. Examples include graphical models, Bayesian networks, hidden Markov models, rule-based learning, genetic algorithms, fuzzy logic, Kohonen networks, mixture models, K-means clustering, principal component analysis, kernel methods etc. Some of these are mutually overlapping.

Parting Thought

There is a tradeoff between accuracy and interpretability of ML predictions. This often manifests as a point of contention between business and technical needs of commercial ML efforts. It has been argued that you cannot have both.