Hope you already know the basics of Recurrent Neural Networks (RNN)s. If not, feel free to refer . In Natural Language Processing (NLP), the RNNs have played a major role in sequence modeling. Let’s see what RNNs can do and what are their weak points. Without jumping into the topic, let’s move slowly! this article In an RNN, we feed the output of the previous timestep as an input in the next timestep. We found that they work well with sequential information like sentences. For clarity, look at the example below: If we consider an example of predicting the next word based on previous words in a sentence, using an RNN, the below image depicts how the RNN behaves in each timestep. The entire hidden state is depicted in yellow color rectangle. You can see that finally the hidden state contains a summary of the sentence. This is how the neurons are connected and RNN appears. But at each timestep, they have individual losses. (You can sum up them together to get a single loss value). If considered backpropagation, in RNN we have an additional parameter called ‘time’ other than the weight matrix. Timestep 5 will propagate the gradient in the usual way. At timestep 4, we have to consider the gradient of timestep 5 also. At timestep 3 we have to consider the gradients of all the timesteps from last timestep. In this way, in the backpropagation of RNN for timestep r, we have to propagate the gradients of last timestep to timestep r-1. What you could see from the above? ✔ Rather than having one or two previous words to predict the next word in a sentence, this approach is dependency preserving! By considering just the previous word for example ‘a’ in above context, neural network may have many possibilities to predict: a river, a student, a car etc. (The possibilities depend on the words in its’ corpus). But in this RNN-based approach, we have a sequence ‘Anne bought a’ to predict next word. So now it has very low possibilities for the next word to become ‘river’ or ‘student’. This is a significant outcome that can be gained using an RNN. So, what’s wrong? Just recall the theoretical stuff I just explained with the example. : Although we discuss that RNN remembers previous content, all the internal executions happen in mathematically bounded environments. The hidden layer’s output is a vector which has a maximum size. So that, when the information exceeds that size, the RNN starts forgetting the stuff. This happens over long distances. For example if you look at the given example, in timestep 2 hidden state had no word to remember, in timestep 2 it was only one word. But at timestep 5 it had 4 words. If that exceeds the capacity of the output vector, it will decay the information. Information decay In the backpropagation process, if the gradient of the activation function is a value between 0 and 1 (example : 0.3), the gradients of the last timesteps will be repeatedly multiplied with that value. (For the sake of understanding, just take the gradient of last timestep as 1 and, at timesteps 4,3,2,1 multiply it with the gradient of activation function(0.3) : 1 x 0.3 x 0.3 x 0.3 x 0.3 = 0.0081 . So now you can understand that it may reach 0 in a longer sequence! The gradient vanishes for the time being! Vanishing gradient : This is the opposite of vanishing gradient. Just imagine the gradient of your activation function as 4.75 and the gradient of last timestep as 1 . What happens when backpropagating over 4 timesteps? 1 x 4.75 x 4.75 x 4.75 x 4.75 =509.06. The gradient grows rapidly with the time and it may have a very large value for the time being. If the sequence is very large the gradient may go beyond the value range of data type which is being used and the value may be marked as ‘NaN’, which will make the entire work as mess! Exploding gradient : For the time being, LSTM (Long Short Term Memory) was introduced, which is able to address the pitfalls of RNNs! First Published here *Lead Image by chenspec from Pixabay*

Yellow

Designing and Developing the Integration of Your Online Shopping Store

Building an Android App on a Flask Server

Runner Up - HACKERNOON CONTRIBUTOR OF THE YEAR 2022- MACHINE-LEARNING

Runner Up - Android Advocate of the Year 2022

Runner Up - HackerNoon Contributor of the Year 2022- Machine Learning

Runner Up - HackerNoon Contributor of the Year 2022 - Rest Api

Nominated for 2022 - HackerNoon Contributor of the Year - Backend

Nominated for 2022 - Most Authentic Developer Advocate of the Year

Too Long; Didn't Read

What Can Recurrent Neural Networks in NLP Do?

What Can Recurrent Neural Networks in NLP Do?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Building an Android App on a Flask Server

8 Open-source NLP Tools You Should Try

A Brief Into to NLP in the Media & Communication Industry

Exploring T5 Model : Text to Text Transfer Transformer Model

How Natural Language Processing Companies Are Transforming SEO Strategies

Incorporating NLP Capabilities Into an Existing Application Stack Is Easier Than Ever: Here's Why

Building an Android App on a Flask Server

8 Open-source NLP Tools You Should Try

A Brief Into to NLP in the Media & Communication Industry

Exploring T5 Model : Text to Text Transfer Transformer Model

How Natural Language Processing Companies Are Transforming SEO Strategies

Incorporating NLP Capabilities Into an Existing Application Stack Is Easier Than Ever: Here's Why

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps