An Introduction To Mathematics Behind Neural Networks by@dasaradhsk

December 23rd 2019 6,326 reads

Today, with open source machine learning software libraries such as TensorFlow, Keras or PyTorch we can create neural network, even with a high structural complexity, with just a few lines of code. Having said that, the Math behind neural networks is still a mystery to some of us and having the Math knowledge behind neural networks and deep learning can help us understand what’s happening inside a neural network. It is also helpful in architecture selection, fine-tuning of Deep Learning models, hyperparameters tuning and optimization.

I ignored understanding the Math behind neural networks and Deep Learning for a long time as I didn’t have good knowledge of algebra or differential calculus. Few days ago, I decided to to start from scratch and derive the methodology and Math behind neural networks and Deep Learning, to know how and why they work. I also decided to write this article, which would be useful to people like me, who finds it difficult to understand these concepts.

Perceptrons — invented by Frank Rosenblatt** **in 1957, are the simplest neural network that consist of* n* number of inputs, only one neuron and one output, where *n* is the number of features of our dataset. The process of passing the data through the neural network is know as forward propagation and the forward propagation carried out in a Perceptron is explained in the following three steps.

The row vectors of the inputs and weights are x = [x₁, x₂, … , xₙ] and w* =*[w₁, w₂, … , wₙ] respectively and their *dot product** *is given by

Hence, the summation is equal to the *dot product* of the vectors* x* and *w*

where, **σ** denotes the *Sigmoid* activation function and the output we get after the forward prorogation is know as the *predicted value* **ŷ**.

The learning algorithm consist of two parts — Backpropagation and Optimization.

Loss function is calculated for the entire training dataset and their average is called the *Cost function ***C**.

Let’s calculate the gradient of cost function **C** with respect to the weight **wᵢ** using *partial derivation.* Since the cost function is not directly related to the weight wᵢ, let’s use the chain rule.

Now we need to find the following three gradients

Let’s start with the gradient of the *Cost function *(C) with respect to the *predicted value *( ŷ )

Let y = [y₁ , y₂ , … yₙ] and ŷ =[ ŷ₁ , ŷ₂ , … ŷₙ] be the row vectors of actual and predicted values. Hence the above equation is simplifies as

Now let’s find the the gradient of the *predicted value *with respect to the *z. *This will be a bit lengthy.

The gradient of *z *with respect to the weight **wᵢ **is

Therefore we get,

What about Bias? — Bias is theoretically considered to have an input of constant value *1*. Hence,

The weights and bias are updated as follows and the Backporpagation and gradient descent is repeated until convergence.

I hope that you’ve found this article useful and understood the maths behind the Neural Networks and Deep Learning. I have explained the working of a single neuron in this article, however the these basic concepts are applicable to all kinds of Neural Networks with some modifications. If you have any questions or if you found a mistake, please let me know in the comment.