In recent years, artificial intelligence, machine learning, and deep learning have become buzzwords. For those who have been working in these fields for awhile, this is somewhat amusing and occasionally even gratifying. I certainly don’t view it as a bad thing that these areas of technology are receiving more widespread attention. With this increased attention also comes the need to explain these concepts to the masses, which is a noble challenge to tackle.
Nonetheless, there is one recent trend that has a similar effect on me as fingernails scratching a blackboard:
Come and learn the difference between machine learning and deep learning!
Aaaaaah!!!!!!! Shivers down my spine!!!
Take this tweet for example:
Or this one:
Or one of my all-time “favorites”:
One would think that the VP of AI and deep learning at IBM, Sumit Gupta, would be a reliable source on the matter, but sadly, he is leading his audience astray.
Deep learning is not something inherently different from machine learning; it is, in fact, one of the latest innovations in the field of machine learning that actually brings us closer to the original goals for which machine learning pioneers had strived. Let us briefly review those goals.
As I recounted in an earlier article, machine learning is not a particularly new idea. The fundamental idea was already articulated by Alan Turing as early as 1947:
What we want is a machine that can learn from experience.
This basic desire has been re-articulated over the years by several generations of machine learning researchers, such as Arthur Samuel, Claude Shannon, Nils Nilsson, and Tom Mitchell. Consider this passage, published in 1959, by Arthur Samuel:
At the outset it might be well to distinguish sharply between two general approaches to the problem of machine learning. One method, which might be called the Neural-Net Approach, deals with the possibility of inducing learned behavior into a randomly connected switching net (or its simulation on a digital computer) as a result of a reward-and-punishment routine.
For those familiar with deep learning, this should sound strikingly familiar. In fact, the basic ideas in artificial neural networks were already established in the 1940s. These ideas were discussed and studied widely in the machine learning community over many decades, but at some point they fell out of fashion.
A curious thing happened in the field of machine learning sometime around the 1980s. I haven’t been able to pinpoint exactly when it occurred, but over time, more and more effort was exerted by researchers in the task of coming up with clever “features” to serve as inputs to machine learning algorithms. For example, instead of feeding the raw pixel values of images into a image recognition algorithm, fancy “pre-processing” algorithms were devised to detect edges, corners, textures, and other higher-level features from the images. These would then serve as the input features into the recognition algorithms. This somewhat laborious process is essentially what is depicted in Mr. Gupta’s slide with the heading “Machine Learning”, shown above.
This process, which is generally known as feature engineering, also tends to be application specific. That is, features that work well in, say, computer vision tasks are usually completely useless in other tasks, such as activity recognition from motion sensors.
In many cases, the number of available features might even become large and unwieldy. In comes the idea of feature selection. New algorithms had to be devised just to figure out which are the best features to use for a given machine learning task.
This emphasis on feature engineering and feature selection, however, is not what defines machine learning. If anything, an over-sized focus on these areas is a distraction from the ultimate aim of machine learning, which is to develop computer systems that automatically learn from experience. This is because in feature engineering the researcher or engineer inserts himself or herself into the equation, trying to optimize the learning process with clever new ideas for features (or deciding on sets of features).
I won’t go into the nitty-gritty of what deep learning is and how it works in detail; there are plenty of good resources on that topic, such as Michael Nielsen’s excellent online book Neural Networks and Deep Learning. Instead I just want to point out how deep learning, in many ways, rectifies the above issues and restores machine learning closer to its original goals.
Instead of coming up with elaborate feature sets, deep learning largely works with high-dimensional “raw data” (e.g. pixels in an image). For sure, deep learning still employs application-specific tricks, such as the idea of convolutional neural networks used mostly in image recognition. In fact, one could argue that restricting a neural network in such fashion is a form of feature engineering. But even so, these techniques are considerably more general than ad hoc features, such as edge detectors from images or “variance of the dynamic acceleration in the horizontal plane” from accelerometer data. The basic idea is to suck whatever data you have into a large and deep neural network and let the learning algorithm do the hard work. Lack of expressiveness in features is made up for with an abundance of data and multiple hidden layers in the network that automatically learn the best features.
By all measures, however, this is still what we call machine learning. Setting aside “deep learning” as something different from “machine learning” is just bad pedagogy. It reveals a lack of understanding for what the goal of machine learning has been ever since the concept was first formulated. The relationship between the two is best expressed in the diagram below. Deep learning falls under the general umbrella of machine learning, which is itself part of the wider picture of artificial intelligence.
In summary, when you are impressing your hot date by describing what deep learning is all about, try not to make the mistake of saying it’s something completely new and way cooler than that old school machine learning stuff. And by all means, stop retweeting those nonsensical tweets pitting machine learning against deep learning. For in reality, deep learning is one of machine learning’s long lost friends.