http://www.mdpi.com/1099-4300/19/3/101 We know that in a neural network, weights are initialized usually randomly and that kind of initialization takes fair / significant amount of repetitions to converge to the least loss and reach to the ideal weight matrix. The problem is, this kind of initialization is prone to vanishing or exploding gradient problems. One way to reduce this problem is carefully choosing the random weight initialization. Xavier’s random weight initialization aka Xavier’s algorithm factors into the equation the size of the network (number of input and output neurons) and addresses these problems. Xavier Glorot and Yoshua Bengio are the contributors for this concept of initializing better random weights. This not only reduces the chances for running into the gradient problems but also helps to converge to least error faster. General ways to make it initialize better weights: a) If you’re using ReLu activation function in the deep nets (I’m talking about the hidden layer’s output activation function) then: Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1. Multiply that sample with the square root of (2/ni). Where ni is number of input units for that layer. b) Likewise if you’re using Tanh activation function : Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1. Multiply that sample with the square root of (1/ni). Where ni is number of input units for that layer. So what is this Xavier’s initialization? Only major difference in Xavier’s initialization is the output no term. We add the number of output units for that layer. For Tanh: Generate random sample of weights from a Gaussian distribution having mean 0 and a standard deviation of 1. Multiply that sample with the square root of (1/(ni+no)). Where ni is number of input units, no is the number of output units for that layer respectively. # python code is here import numpy as np W = np.random.rand((x_dim,y_dim))*np.sqrt(1/(ni+no)) Why does this initialization help prevent gradient problems? This sort of initialization helps to set the weight matrix neither too bigger than 1, nor too smaller than 1. Thus it doesn’t explode or vanish gradients respectively. I learnt this from Coursera’s Awesome Deep Learning Specialization: deeplearning.ai Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization : https://www.coursera.org/learn/deep-neural-network/ Here is the original Paper: Understanding the difficulty of training deep feedforward neural networks ; PMLR 9:249–256 Xavier Glorot, Yoshua Bengio If you liked this article, then clap it up! :) Maybe a follow? Connect with me on Social: _View Rakshith Vasudev's profile on LinkedIn, the world's largest professional community. Rakshith's education is listed…_www.linkedin.com Rakshith Vasudev | LinkedIn _Rakshith Vasudev. Learn Artificial Intelligence with me as we progress to make this world a better place. Tensorflow…_www.facebook.com Rakshith Vasudev _Getting started with Datascience, best programming practices. Topics include Machine Learning and others._www.youtube.com Rakshith Vasudev

Facebook

Initialized

YouTube

Too Long; Didn't Read

In your car, at home, or at work — Bosch technology shapes many areas of life.

How to Initialize weights in a neural net so it performs well?

Too Long; Didn't Read

Companies Mentioned

Rakshith Vasudev

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

How to Initialize weights in a neural net so it performs well?

Too Long; Didn't Read

Companies Mentioned

Rakshith Vasudev

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES