Skip to content

Implementing an Attention Model from scratch using Keras. Its then used to convert human-readable dates such as "18th of August 1997" to machine-readable format (1997-08-18).

Notifications You must be signed in to change notification settings

RolandSherwin/neural-machine-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Attention Model

In a sequence-to-sequence model, the traditional approach is to run the input through a series of RNN/GRU/LSTM units which memorizes the entire input sequence. This is the encoding part of the network, it outputs a single vector which should then pass the entire information about the input sequence to a decoder which decodes to give the output sequence.

As we can see this of model fails when a very long sequence is given since its hard to memorize the entire input. Thus we use a much powerful model called Attention Model. In an attention model, to generate the first output word, we just need to look at the first few words of the input and we don't need to look very deep in the input sequence. Thus we can compute some attention weights, alpha^{<t, t'>} which tells how much attention we need to pay for a specific input word-t' when generating a specific output word-t.

About

Implementing an Attention Model from scratch using Keras. Its then used to convert human-readable dates such as "18th of August 1997" to machine-readable format (1997-08-18).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published