An **Encoder** reads and encodes a source sentence into a **fixed-length vector**.\n\nA **Decoder** then outputs a translation from the encoded vector.\n\n#### **Limitation**\n\nA potential issue with this encoder–decoder approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a **fixed-length vector**.\n\n#### **How Attention solves the problem?**\n\nAttention Mechanism allows the decoder to attend to different parts of the source sentence at each step of the output generation.\n\nInstead of encoding the input sequence into a **single fixed context vector**, we let the model learn **how to generate a context vector** for each output time step. That is we let the model **learn** what to attend based on the input sentence and what it has produced so far.\n\n### Attention Mechanism\n\nHere, the **Encoder** generates **h1,h2,h….hT** from the inputs **X1,X2,X3…XT**\n\nThen, we have to find out the **context vector ci** for each of the output time step.\n\n#### **How the Context Vector for each output timestep is computed?**\n\n!(https://hackernoon.com/hn-images/1*Agii69DmmkTAGBLNUCeN0g.png)\n\n**a** is the **Alignment model** which is a **feedforward neural network** that is trained with all the other components of the proposed system\n\nThe **Alignment model** scores (e) how well each encoded input (h) matches the current output of the decoder (s).\n\n!(https://hackernoon.com/hn-images/1*VBalT1KkZ16WGvjWpjYo4g.png)\n\nThe alignment scores are normalized using a **softmax function.**\n\n!(https://hackernoon.com/hn-images/1*uXp_uFfXbAqqrJx5g-xYzQ.png)\n\nThe context vector is a weighted sum of the **annotations** (hj) and **normalized alignment scores.**\n\n### Decoding\n\nThe Decoder generates output for i’th timestep by looking into the i’th context vector and the previous hidden outputs s(t-1).\n\n#### Reference\n\n— [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473), 2015.