In this series, I introduce machine learning at different technical levels, with the aim of providing a basic framework that helps you understand machine learning, regardless of your background, starting at the highest level.
In traditional programming, programmers write programs, which are made of lines of code that instruct computers to perform certain tasks. For example, a programmer can write a program to detect whether the word “book” exists in a news article. Each program has an input and output, and in this case, when you input the text of a news article, the program outputs whether the word “book” exists.
Machine learning programs are just like traditional computer programs: You give it an input, it gives you an output. As a typical example, a handwriting recognition machine learning program takes as an input an image of a handwritten digit, and gives you as an output of which digit it is (e.g. 3).
So if machine learning programs are so similar to traditional programs, why is there all this hype about machine learning? Where is the magic?
The magic lies in the process of writing ML programs. Whereas traditional programs are written with a rule-based system, machine learning programs are written (trained) with probability and statistics.
As a simplified example, if we want to write a program to recognize Ed Sheeran, in traditional programming, you would tell the computers, “ok, Ed Sheeran has red hair, green eyes, pale skin, wide mouth. If you see these features, it’s Ed Sheeran.”
In machine learning, instead of telling computers what Ed Sheeran looks like, you show the computers some photos of Ed Sheeran vs. not Ed Sheeran; the computer then learn via statistics about what Ed Sheeran looks like and when given a new image, it can tell you whether it is Ed Sheeran.
As you can tell, in cases like this, the machine learning approach is much more effective because there are too many scenarios you have to enumerate for a rule-based system: what if in this image Ed Sheeran is wearing glasses?
What if he has a beard? Even if we get “red hair, green eyes, pale skin, wide mouth” in the image, how do we know it’s Ed Sheeran as opposed to some random ginger dudes? On the other hand, with the statistics-based approach, you could train the computer to recognize Ed Sheeran by showing it enough examples of Ed Sheeran vs. not Ed Sheeran.
This new programming paradigm has created so much hype because it has allowed us to create programs that were previously impossible with the traditional programming paradigm: examples include voice recognition, object recognition, autonomous driving etc.
In particular, computers now can be taught / programmed to perform tasks that traditionally could only be done by humans and thereby replace human jobs - just go to any Amazon warehouse to witness for yourself.
Note that people too often conflate the execution (inference) of ML programs and the training of ML programs. This distinction is quite important, especially for understanding concepts such as AI DApps.
While the process of writing an ML program is different from writing a traditional program, at runtime, an ML program behaves more or less just like a traditional program in that given an input, it gives you an output.
In the next article, I will go into further details about how exactly a machine can learn via statistics and probability. Stay tuned!
- Some nuances are left out for the sake of readability. Please leave a comment if you find certain parts confusing.
- For the purpose of this article ML model = ML program; ML inference = ML execution; writing an ML program = training an ML program)