a very short guide for non-technical people Taxonomy A model predicts y from x by from examples of (x, y) pairs. This kind of ML falls into two broad categories: (each y is one out of many classes, example, identity of a person on a photo) and (each y is a number, example, risk in a credit application). Supervised learning: learning classification regression There is no y, only x. Examples of this kind of ML include [a] of data samples, [b] detection in a collection of data samples, [c] from a time series data sequence, [d] learning latent , example, representing English words as vectors in a high dimensional space such that related words are close to each other. [You can argue there is a ‘y’ in forecasting, but it’s implicit, kind of.] Unsupervised learning: clustering anomaly forecasting representations There is a y, but it is in the form of a reward or punishment. Typically the data arrives as a dynamic stream. You interact with a system and get rewarded or punished for your actions. Example, an autonomous car learning to drive. Reinforcement learning: Following are four ML technologies that are currently prominent, in decreasing order of success in real world applications. In principle, any ML technology can be applied to any part of the taxonomy above. Neural Nets These models are by connections of neurons in the brain. They are like cartoons of the real thing. Since their invention in 1950s, the technology has gone through several in research and funding. Currently, it’s . The basic building block of neural nets is an artificial neuron that computes a weighted sum of its inputs and applies a nonlinear function to that sum to produce its output. These neurons are connected to each other in a , typically in a topology, without a feedback loop. The basic method of training these nets is . neural nets are one particularly potent variety. If you add feedback within neural networks, you get nets, of which nets have been very successful recently. is a somewhat ill-defined term applied to a some spectacularly successful neural net applications. To most people, it refers to a large network with a high number of . I think deep learning also refers to other nuanced changes to the practice of neural network modeling, including an aversion to and an affinity to automatically learning (hierarchical) feature abstractions. inspired winters spring activation network layered backpropagation Convolutional recurrent LSTM Deep Learning layers feature engineering Decision Trees A decision tree is a computer science version of the game of . You make each decision based on the value of one attribute (example: “does it have four legs?” decides whether “it” can be a bird). The decisions are branches in a , where each successive choice of attribute depends on previous decisions. and are two of the earliest algorithms for learning decision trees, both from the 1970s. You can gather multiple decision trees together into , where predictions from individual trees can be aggregated, for example, by voting. One such is a . There are two main methods of assembling forests from trees: and , but both these techniques apply to ensembles of non-tree ML models as well. 20 Questions tree CART ID3 forests ensemble Random Forest boosting bagging Linear Models The artificial neurons in neural nets are linear models. A linear model produces an output that is a linear combination (weighted sum) of the inputs, possibly followed by a transformation (activation) function. Compared to full-fledged neural network models, they are extremely simple. However, sometimes a linear model is the only practical option when faced with very large sparse problems. Such problems may have millions of dimensions (variables), most of which are absent in any specific example. Google’s 24/7 data pipeline for predicting clicks and conversions for ads employed such as a system (as of 2009). Linear models can be quite effective when combined with appropriate training and techniques. It continues to be a field of research. optimization ripe Support Vector Machines The idea behind SVMs is beautifully explained (unfortunately, you will have to load each image). If you had balls of two colors on a table and you wanted to separate the red from the blue using a stick, an SVM will try to put the stick in the best possible place by having as big a gap on either side of the stick as possible. However, there may not be any position of stick that separates the colors (the problem may not be ). In that case you flip the table, throwing the balls in the air. With the balls magically suspended in space at different heights, you may be able to position a sheet of paper separating the colors. The flip of the table is a mathematical transformation using a . One problem with SVMs is that learning is slow, if not impractical , for large amounts of data. here linearly separable kernel image source: https://xkcd.com/1725/ Other ML There are many other ML technologies that are not described here. Examples include graphical models, Bayesian networks, hidden Markov models, rule-based learning, genetic algorithms, fuzzy logic, Kohonen networks, mixture models, K-means clustering, principal component analysis, kernel methods etc. Some of these are mutually overlapping. Parting Thought There is a tradeoff between accuracy and interpretability of ML predictions. This often manifests as a point of contention between business and technical needs of commercial ML efforts. It has been argued that you cannot have both.