No roots, only root~#
Hello, there! In the next few minutes, we'll talk about a subject called Deep Learning. Have you heard about it?
This text is not technical, but didactic. There is no need for any theoretical background.
So, what do you know about Deep Learning? How do you think it works? Take some time to think about it.
Spoiler: It's not like this!
But seriously, how would you classify the following images:
Let's assume that the results are 'cat' and 'dog' (or 'kitten'and 'puppy', anything close to that), respectively.
How did you come to these conclusions? Can you try to describe how does your cognitive process worked to arrive those answers?
When we receive a stimulus, be it visual, tactile, an odor or a sound, this 'input' is processed by our neurons, which are connected to other neurons through regions called synapses. Small electrical impulses are transmitted and activate, or not, certain regions of the brain.
Is our 'neural network' responsible for recognizing animals the same that recognizes a person through their voices? As we know, different regions of the brain are responsible for handling different tasks.
To solve a problem using a deep learning solution, we start by describing our needs. We want to find a function that relates objects.
For example: send in a picture of an animal and find out whether it's cat or a dog.
Thus, we define a domain (starting point) and a counterdomain (arrival point). Knowing where from where we are leaving and where we want to go, we can begin to develop the journey along this path.
In general, neural networks are machine learning models called 'supervised', that said, we need to obtain categorized data in order to train our model.
This is an example of a neural network architecture. There is an input layer (yellow), inner layers called 'hidden' (blue and green), and an output layer (pink).
Let's suppose that we want to develop a model to calculates a person's risk of heart attack. Our input data will be: height, weight and age. And our 'output' will be a likelihood of a heart attack hapenning before the forties.
First, we will have to train our model, teach it how to corellate the data. We will do this by inserting the examples, one by one. Our input values go through basic mathematical operations (+, -, *, ÷) and then generate an output data, a result. The values with which our input data are operated throughout the process are called weights.
Firstly, this result will be wrong. But as our model is supervised, we can tell the model that, besides being wrong, what would be the correct value for the result. This difference (arithmetic: real_value - model_result) between the result and the correct value will be called Loss. Later on, the sum of these losses (for each trainig example), we will call Cost.
So far we have the following concepts:
Since we defined Cost as the sum of losses. We want to minimize the Cost as much as possible, right? RIGHT! It will make our model as accurate as possible and give us reliable results.
Last but not least, there is a definition in mathematics called derivative. What the derivative represents is nothing more than the rate of change of a variable over a given result.
For example: If we have a graph that describes the profit of a company according to the amount of sales.
When we calculate the derivative of sales regarding profit, we will find this rate, which represents how much every extra point in sales will influence the result (positive or negatively).
So, does it make sense to find out how much the weights of each operation influenced the output result, and therefore, the loss? YES!!! That is, how much each weight is responsible for our error.
A neural network usually goes through, for each input example, two steps: Foward and backward.
Foward step:
A training example is ingested, operated on and the network issues a result. With the result, we calculate the loss.
Backward step:
We calculate of the loss regarding each weight, finding how much each one of these weights were responsible for the result. With this rate of change, we will update these weights over the layers.
E.g.: new_weight = old_weight - 0.1 * old_weight _derivative
Mind the red arrows:
Now, with the weights updated, we will insert the next example, which will go through the same steps.
When all the examples have gone through this process, we will say that we have carried out an epoch.
In general, to train, we will submit the model to several epochs. At each epoch, we sum the losses and calculate the cost.
Using it as one of the metrics to observe how far the model's training is progressing.
Now that we know a little about this type of technology, what do you think we can achiev with it? Any ideas?
Best regards!