18,688 reads

18,688 reads

Attention Mechanism in Neural Network

by Pranoy RadhakrishnanOctober 14th, 2017

Read on Terminal Reader

Read this story w/o Javascript

Too Long; Didn't Read

An <strong>Encoder</strong> reads and encodes a source sentence into a <strong>fixed-length vector</strong>.

Company Mentioned

Mention Thumbnail

featured image - Attention Mechanism in Neural Network

An Encoder reads and encodes a source sentence into a fixed-length vector.

A Decoder then outputs a translation from the encoded vector.

Limitation

A potential issue with this encoder–decoder approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a fixed-length vector.

How Attention solves the problem?

Attention Mechanism allows the decoder to attend to different parts of the source sentence at each step of the output generation.

Instead of encoding the input sequence into a single fixed context vector, we let the model learn how to generate a context vector for each output time step. That is we let the model learn what to attend based on the input sentence and what it has produced so far.

Attention Mechanism

Here, the Encoder generates h1,h2,h….hT from the inputs X1,X2,X3…XT

Then, we have to find out the context vector ci for each of the output time step.

How the Context Vector for each output timestep is computed?

a is the Alignment model which is a feedforward neural network that is trained with all the other components of the proposed system

The Alignment model scores (e) how well each encoded input (h) matches the current output of the decoder (s).

The alignment scores are normalized using a softmax function.

The context vector is a weighted sum of the annotations (hj) and normalized alignment scores.

Decoding

The Decoder generates output for i’th timestep by looking into the i’th context vector and the previous hidden outputs s(t-1).

Reference

— Neural Machine Translation by Jointly Learning to Align and Translate, 2015.

HackerNoon Services

L O A D I N G
. . . comments & more!

About Author

Pranoy Radhakrishnan@pranoyradhakrishnan

Deep Learning Engineer

Read my stories

TOPICS

purcat-img

machine-learning #machine-learning #artificial-intelligence #deep-learning #recurrent-neural-network #machine-translation

THIS ARTICLE WAS FEATURED IN...

Read on Terminal Reader

Read this story w/o Javascript

Onlinecoursesschools

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas