In recent years, artificial intelligence (AI) has been the subject of intense exaggeration by the media. The Machine Learning and Deep Learning in Spanish Machine Learning (AA) and Learning Deep (AP), with the IA, have been mentioned in countless articles and media regularly outside the realm of purely technological publications. We are promised a future of smart chat bots, autonomous cars and digital assistants, a future sometimes painted in a gloomy tint and other times in a Utopian way, where jobs will be scarce and most economic activity will be managed by robots and machines. embedded with AI.
For the future or current Machine Learning practitioner, it is of vital importance to be able to recognize the signal in the noise, so that we are able to recognize and spread about the developments that are really changing our world and not the exaggerations commonly seen in the media. Communication. If, like me, you are a practitioner of Machine Learning, Deep Learning or another field of AI, we will probably be the people in charge of developing those intelligent machines and agents, and therefore, we will have an active role to play in this and future society.
For this purpose, this article aims to answer questions such as: What has Deep Learning achieved so far? How significant are these achievements? What awaits us in the future? Do we really have to believe in the media hype of the moment?
First of all, let's clearly define what we are talking about when we talk about artificial intelligence. What is Artificial Intelligence, Machine Learning and Deep Learning? And what is the relationship between these terms?
Artificial Intelligence
Artificial intelligence (AI) was born in the 1950s at the hands of some pioneers in the nascent field of computer science. These pioneers then began to wonder about being able to get computers to "think." Therefore, a concise definition about the field of AI would be: the effort to automate intellectual tasks normally performed by humans. As such, AI is a general field that contains Machine Learning (AA) and Deep Learning (AP), but also includes other types of sub-fields that do not necessarily involve “Learning” as such.
The first programs that played chess, for example, only involved rigid rules created by programmers, so they do not qualify as machine learning. For a long time, many experts believed that human-level AI could be achieved by having programmers create by hand a set of rules large enough to manipulate knowledge and thus generate intelligent machines. This approach is known as symbolic AI , and it was the paradigm that dominated the AI field from 1950 to the late 1980s and peaked in popularity during the Expert Systems boom in 1980.
Although symbolic AI proved to be suitable for solving logical and well-defined problems, such as playing chess, it became intractable to find explicit rules for solving much more complex problems, such as image classification, voice recognition, and translation between languages. natural (such as Spanish, English, etc., other than non-natural languages such as programming languages). A new approach then emerged to take the place of symbolic AI: The Machine Learning and Machine Learning .
Machine Learning
In Victorian England, around 1840 and 1850, Charles Babbage invented the Analytical Engine : The first general-purpose mechanical computer. It only computed operations mechanically in order to automate the computation of certain operations in the field of mathematical analysis, hence its name Analytical Engine. However, this analytical engine did not have the pretensions to originate something new, it could only do what it was ordered to compute, its only purpose was to assist mathematicians in something they already knew how to do.
Then in 1950 Alan Tuning introduced the Turing test, and concluded that general-purpose computers might be able to "Learn" and "be original." The AA then arose from questions such as:
Can a computer go beyond what we order it to do and learn by itself how to perform a specific task? Could a computer surprise us? And, instead of programmers specifying rule by rule how to process data, could a computer automatically learn those rules directly from the data we passed to it?
The question opened a new door to a new programming paradigm. Unlike the classic symbolic AI paradigm, where humans inject rules (a program) and data to be processed according to these rules in order to obtain responses at the exit of the program, with Machine Learning or Machine Learning, humans pass the data as input as well as the expected responses of said data in order to obtain at the output the rules that allow us to do the effective mapping between inputs and their corresponding outputs. These rules can then be applied to new data to produce original responses, that is, generated automatically by the rules that the system "learned" and not by rules explicitly coded by programmers.
A machine learning system is " trained" instead of being explicitly " programmed" . Many examples relevant to the task at hand are presented to this system and it finds the statistical structure or patterns in those examples that eventually allow the system to learn the rules to automate said task. For example, if we wanted to automate the task of tagging our vacation photos, what we would do is pass many examples of photos already tagged by humans to the AA system and the system would learn the statistical rules that would allow it to associate specific photos with their respective tags. .
However, although AA began to be considered since the 1990s, it has become the most popular and successful AI sub-field, a trend driven by the availability of better hardware and giant data sets. . AA is strongly related to mathematical statistics, however it differs from statistics in several ways. Unlike statistics, AA tends to deal with large and complex data sets (which can contain millions of images each with thousands of pixels) for which the classic statistical analysis (such as Bayesian analysis) would be totally impractical. As a result, AA, and especially Deep Learning, shows little mathematical theory, compared to the field of statistics, and are considered more as engineering-oriented fields. That is, AA is an applied discipline, in which ideas are tested much more often empirically than theoretically.
To define Deep Learning and understand the difference between AP and other AA approaches, we first need some idea of what AA algorithms do and how they work. We have just said that AA discovers rules for executing data processing tasks, given the examples of what is expected as a response or output of such data. Therefore to carry out AA we will need three fundamental ingredients:
Input data : For example, if the task is voice recognition, this input data would be sound files or recordings of people talking. If the task is image tagging, this data could be photos or images.Examples of what is expected as output : In the speech recognition task, these could be human-generated transcripts of the audio files. In the image tagging task, the expected outputs can be tags such as "dog", "cat", "person", etc.One way to measure if the algorithm is doing a good job : This step is necessary to determine the distance or offset between the current output generated by the algorithm and the expected output. This measurement is used as a feedback signal to adjust the way the algorithm works and updates. This adjustment step is what we call “Learning”.
These ingredients by themselves are fundamental to all kinds of AA and AP algorithms. With these ingredients we will now explore what an AA and AP algorithm really does with them to produce results that look like they came out of fictional stories.
An AA model transforms its input data into meaningful responses, a process that is "learned" from exposing that model to previously known examples of corresponding inputs and outputs. Therefore, the central problem in AA and AP is learning useful representations of the input data , representations that bring us closer to generating or predicting the expected outputs.
Before going further, let's answer the question, what is a representation ? At its core, a representation is a different way of viewing data, a different way of representing or encoding data. For example, a color image can be encoded in RGB (red-green-blue) format or in HSV (hue-saturation-value) format: These are two different representations of the same data. Some tasks that may be more difficult using one of those representations can be made much easier by using the other representation.
For example, the "select all red pixels in one image" task is much simpler in the RGB format while the "make the image less saturated" task is simpler in the HSV format. AA models are designed to find the most appropriate representations of the information they receive as input, transformations of the data that make them enjoyable for the task at hand, such as the image classification task, for example.
Let's make this a little more concrete using the following example. Let's consider an xy cartesian plane with some points represented by their respective coordinates (x, y) as shown in the following image:
As we can see, we have a few white dots and a few other black dots. Let's say we want to develop an algorithm that can take the (x, y) coordinates of a point and output as output if the point is black or white. In this case:
What we need here is a new representation of our original data that cleanly separates white from black dots. One of those transformations which we could use, among many other possibilities, would be a change of coordinates, like this:
In this new coordinate system, the coordinates of our points can be said to be a new representation of our original data. And in this case it is a very good representation! With this new representation, the classification problem between black and white points can be expressed with a simple rule: "Black points are which x> 0 " and "White points are which x <0 ". This new representation basically solves the classification problem.
In this case, we define the coordinate change by hand. But if instead we try to systematically search for different possible coordinate changes, and use the percentage of correctly classified points as feedback, then we will be doing ML / AA. “Learning” in the AA context describes the process of automatically searching for the best and most useful representations for our data.
All AA algorithms consist of automatically finding these representations that convert input data into much more useful representations of them for a specific task. These operations can be changes of coordinates, linear projections, translations, nonlinear operations, etc. AA algorithms are not usually creative in finding these transformations, they are merely searching through a predefined set of operations, that set is called the hypothesis space.
So, that's what AA really is, technically: Searching for useful representations of the input data, within a predefined space of possibilities, using a feedback signal as a guide that allows us to make viable predictions of the expected outputs. This simple idea allows solving a wide range of intellectual tasks, from voice recognition, computer vision and even autonomous cars.
Now that we have understood what “Learning” means in the context of AA, let's look at what makes Deep Learning so special.
Deep Learning or Deep Learning is a specific sub-field of Machine Learning: A new attempt to learn ideal representations of data in which an emphasis is placed on learning these representations in succession through what are called layers . The term “ Deep ” in Deep Learning does not make any reference to a type of deep understanding achieved through the use of this type of approach, instead, the term represents the idea of successive and hierarchical representation of data by layers . The number of layers that contribute to a model is called the "model depth".With this in mind, other appropriate names for this approach could be " Layered Representational Learning " or " Hierarchical Representational Learning ".
Modern AP models normally involve tens or hundreds of successive layers of representation, and all the parameters they contain are automatically learned by exposing these models to so-called training data . Meanwhile, other approaches in AA tend to focus on learning using only one or two layers of representation for their data, therefore these types of approaches are called Shallow Learning models , the opposite of Deep Learning or Deep Learning.
In the AP, these layered representations are almost always learned through models called Neural Networks , which are literally structured in layers stacked one after the other. The term Neural Networks is a reference to neurobiology, but although some of the core concepts in AP were developed in part from the inspiration drawn from our understanding of the brain, AP models are not brain models. There is no evidence that the brain implements some of the learning mechanisms used in modern AP models.
Many of us have come across articles and magazines proclaiming that AP models work like the human brain or that they were modeled based on the human brain, but that is not the case. It might be confusing and counterproductive for newbies entering this subfield to think that AP is in any way related to neurobiology. For our purposes, the AP is a mathematical framework for representational learning of data.
To gain a little more insight into what the representations learned by an AP algorithm look like, let's examine how a network with several layers of depth transforms an image of a handwritten digit in order to recognize what digit it is on its way out.
As we can see in the previous figures, the network transforms the image of the digit into representations that are increasingly different from the original image and in turn more and more informative regarding the final result. We could think of a “Deep Neural Network” as a multi-stage information distillation operation, where information flows through successive filters and comes out increasingly purified , that is, much more useful with respect to a specific task that we want to solve, in this case the recognition of digits from images of said digits written by hand.
So that's what the AP is, technically: A multi-stage way of learning representations of data . It's a simple idea, but it turns out that very simple mechanisms, being scaled enough, can end up looking like magic.
At this point, we already know that AA is about mapping inputs (as images) to targets (such as the “cat” tag), and that it is done by looking at many examples of inputs and their corresponding targets. We also know that Deep Neural Networks ( Deep RNs) perform this mapping of inputs to targets by applying many (successive depths of the network) simple and successive transformations of the data (through the layers of the network), and that these transformations of the data is learned by exposing the network to many input-target examples . Let us now look at how specifically this " learning " occurs in these deep Neural Networks.
The specification of what a layer should do to its inputs is stored in the so-called layer weights , which are essentially a bunch of numbers. In technical terms, we would say that the transformation performed by a layer to its input data is parameterized by its weights . These weights or weights are then called the parameters of the layer in question. In this context, “ Learning ” means finding that set of numerical values for each of the weights in each of the layers of the RN, so that the RN is able to correctly map the inputs of our examples with their corresponding objectives.
however, to control something, we must first be able to observe it. To control the output or response of our RN, we need to be able to measure how deviant its outputs or predictions are from the expected or target outputs. This is the work of the Cost Function or Loss Function of the RN, sometimes also called the Objective Function. This loss function is responsible for taking the predictions delivered by the RN along with the true objective (what we want the RN to produce) and then computing the deviation value or score , capturing how well the RN has done its job of prediction for that specific example, as we see in the following figure.
Image Source: https://miro.medium.com/max/586/1*TQZNB5lnM_W7tu5KGYQ12Q.png
The fundamental trick in the AP is to use this deviation value as a feedback signal to adjust the value of each of the weights a little, in the direction that the loss value decreases for the current example, as shown below:
The adjustment of the value of each of these weights or network parameters is the work of the Optimizer, which implements what is called the Backpropagation or Reverse Propagation Algorithm . This algorithm is one of the vital algorithms , if not the most vital, as far as the AP is concerned.
Initially, the weights of the RN are assigned with random values, so that initially the RN will merely implement a series of random transformations. Naturally, the RN outputs at this starting point will be far from what they should ideally be, and the loss value will be very high as well. But, with each example that the RN processes, each of the weights is adjusted a little in the correct direction, that is, in the direction in which the loss value decreases.
This is the so-called Training Loop , which, repeated enough times (typically dozens of iterations over thousands or millions of examples), will produce the values for each of the RN weights that will minimize the loss function . An RN with minimal loss is a network in which its outputs or predictions are as close as possible to the true objectives: Then we will have a Trained Network .
Although the AP is a bit old subfield of AA, it peaked in early 2010. In subsequent years, it has accomplished nothing less than a revolution in the field, with notable results in perceptual problems such as vision and hearing (problems involving abilities that seem natural and intuitive to us humans but have long been complex and elusive to machines).
In particular, the AP has allowed the following advances, all of them in areas historically difficult for the AA:
Classification of images close to the human level.Voice recognition close to the human level.Handwriting transcript close to the human level.Improved machine translation.Improved text to dialog conversion.Digital assistants like Google Now and Amazon Alexa.Autonomous driving close to the human level.Improvement in personalized advertising used by Google, Baidu and Bing.Improved web search engines.Ability to answer questions asked in natural languages.Super human level in certain games such as Go, chess, among others.
We are still exploring the wide range of possibilities in which the AP can contribute its grain of sand. AP has begun to be applied to a wide variety of problems outside the typical machine perception and understanding of natural language, such as formal reasoning, causality, etc. If successful, we would be in an era where the AP assists humans in science, software development, medicine, and many other fields.
Although the AP has led us to achieve remarkable achievements in recent years, the expectations for what this field will be able to achieve in the next decade tend to be exaggerated much more than could be possible. Although some apps that radically change the world like autonomous cars are already within reach, many others will most likely stay out of our reach for a long time, such as truly credible dialog systems, machine translation on a human level through languages arbitrary, and human-level understanding of natural language.
In particular, no talk about General Artificial Intelligence on a human level should be taken very seriously. The risk with high short-term expectations is that, as technology fails to deliver results, investment in research will gradually stop, slowing progress for a long time.
This has happened before. Twice in the past, AI has entered a cycle of intense optimism followed by one of disappointment and skepticism, resulting in under-investment. It first started with symbolic AI in 1960. In those early years, projections about AI flew high and some pioneers in the field predicted that in 10 years the creation of general artificial intelligence would be a solved problem. However even today in 2019 that milestone seems to be far from being reached, so far that we are not yet able to predict when it will happen. Years later, seeing that these high expectations failed to materialize, the investment of researchers and the government moved away from the field of AI, marking the beginning of what was called the first winter of AI .
And this would not be the last. In 1980, a new attempt at symbolic AI, this time by the field of expert systems , began to gain traction among large companies. A few initial success stories fueled a new wave of investment, with corporations across the world starting their own internal AI departments to develop these expert systems. Around 1985, companies were spending close to $ 1 trillion per year on this technology. However, in the early 1990s, these systems proved to be difficult to maintain, difficult to scale, and limited in their operating range, so interest in expert systems was slowly dying. That's where the AI's second winter originated .
We could currently be in the third cycle of exaggeration and disappointment in the field of AI, even though we are now in the phase of intense optimism. It is better to moderate our expectations in the short term and make people more familiar with the technical side of the field, so that they have a clear idea of what the AP can and cannot deliver to us.
Although we may have short-term expectations about AI, the long-term picture looks bright. We are still initiating the application of the AP in various and important problems for which it looks totally like a transforming technology, from clinical diagnosis to digital assistants.
AI research has moved rapidly and by leaps and bounds in the past five years, this largely at the investment level never before seen in the short history of AI, but so far relatively little of this progress has managed to make its way to the products and processes that shape the world. Most of the research findings in the AP have not yet been applied, or at least not applied to the full range of problems that the AP could solve across all industries. Our doctor does not use AI yet, nor does our accountant. We still don't use AI technologies in our daily routine.
Okay, we can ask our smart phone simple questions and get reasonable answers, we can get quite useful product recommendations from Amazon, and we can also search the word "birthday" in Google Photos and immediately find our birthday photos from the month or last year . That's a breakthrough compared to what these kinds of technologies used to be. However, these technologies are still only accessories for our daily routines. AI has yet to make the transition to become central to the way we work, think, and live.
Right now, it can be hard to believe that AI can have such a huge impact on the world, and that is mainly because it is not yet widely deployed and applied, just as it was when in 1995 it would be hard to believe the future impact of the internet. Long ago, people could not predict how the internet would be so relevant to them and how it would dramatically change their lives. The same is true for AP and AI these days. But make no mistake, AI is yet to come. In the not too distant future, AI will be our personal assistant, even our friend. It will answer our questions, help educate our children, and will be aware of our state of health. She will send us our markets to the door of the house and will take us structomously from point A to point B. It will be our interface in a much more complex and sensitive world to information. And, even more importantly, AI will help humanity progress, assisting human scientists in cutting-edge new discoveries across all scientific fields, from genomics to mathematics.
Along the way, we may face some problems and even a new AI winter, similar to what the internet industry experienced in 2000. But we will get there, eventually. AI will end up being applied in almost every process that involves our society and our daily lives, just as the internet is today.
We must not believe in exaggeration in the short term, but we must believe in the long-term vision. It can take a long time for AI to be deployed to its true potential, a potential that some of us have not even dreamed of, but AI is on the way, and it will transform our world in a fantastic way.
That's it for this time colleagues. Those interested in delving a little deeper into these ideas and obtaining additional information (in English), I recommend you review the incredible book Deep Learning with Python by Francois Chollet. Until next time and do not forget to leave your applause and share the article if it was useful.
For more information read our top articles here.