Before you go, check out these stories!

Hackernoon logoGradient Clipping by@init_27

Gradient Clipping

Author profile picture

@init_27Sanyam Bhutani

๐Ÿ‘จโ€๐Ÿ’ป H2Oai ๐ŸŽ™ CTDS.Show & CTDS.News ๐Ÿ‘จโ€๐ŸŽ“ ๐ŸŽฒ Kaggle 3x Expert

You can find me on twitter @bhutanisanyam1

During โ€˜Trainingโ€™ of a Deep Learning Model, we backpropogate our Gradients through the Networkโ€™s layers.

During experimentation, once the gradient value grows extremely large, it causes an overflow (i.e. NaN) which is easily detectable at runtime or in a less extreme situation, the Model starts overshooting past our Minima; this issue is called the Gradient Explosion Problem.

This is when they get exponentially large from being multiplied by numbers larger than 1, consider the example:

Source: Hintonโ€™s Coursera Lectureย Videos.

Gradient clipping will โ€˜clipโ€™ the gradients or cap them to a Threshold value to prevent the gradients from getting too large.

In the above image, Gradient is clipped from Overshooting and our cost function follows the Dotted values rather than its original trajectory.

L2 Normย Clipping

There exist various ways to perform gradient clipping, but the a common one is to normalize the gradients of a parameter vector when its L2 norm exceeds a certain threshold:

new_gradients = gradients * threshold / l2_norm(gradients)

We can do this in Tensorflow using the Function

tf.clip_by_norm(t, clip_norm, axes=None, name=None)

This normalises t so that its L2-norm is less than or equal to clip_norm

This operation is typically used to clip gradients before applying them with an optimizer.

You can find me on twitter @bhutanisanyam1
Subscribe to my Newsletter for a Weekly curated list of Deep learning, Computer Vision Articles
Here and Here are two articles on my Learning Path to Self Driving Cars


Join Hacker Noon

Create your free account to unlock your custom reading experience.