Attention Mechanism in Neural Network

An Encoder reads and encodes a source sentence into a fixed-length vector.

A Decoder then outputs a translation from the encoded vector.

Limitation

A potential issue with this encoder–decoder approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a fixed-length vector.

How Attention solves the problem?

Attention Mechanism allows the decoder to attend to different parts of the source sentence at each step of the output generation.

Instead of encoding the input sequence into a single fixed context vector, we let the model learn how to generate a context vector for each output time step. That is we let the model learn what to attend based on the input sentence and what it has produced so far.

Attention Mechanism

Here, the Encoder generates h1,h2,h….hT from the inputs X1,X2,X3…XT

Then, we have to find out the context vector ci for each of the output time step.

How the Context Vector for each output timestep is computed?

a is the Alignment model which is a feedforward neural network that is trained with all the other components of the proposed system

The Alignment model scores (e) how well each encoded input (h) matches the current output of the decoder (s).

The alignment scores are normalized using a softmax function.

The context vector is a weighted sum of the annotations (hj) and normalized alignment scores.

Decoding

The Decoder generates output for i’th timestep by looking into the i’th context vector and the previous hidden outputs s(t-1).

Reference

— Neural Machine Translation by Jointly Learning to Align and Translate, 2015.