NLP - Summary of Sequence to Sequence with Neural Networks

About Paper

Key Contributions

Achievements

Sequence to Sequence Learning

The Model

The Recurrent Neural Network (RNN) is a natural generalization of feedforward neural networks to sequences. However it would be difficult to train the RNNs due to the resulting long term dependencies. The Long Short-Term Memory (LSTM) is known to learn problems with long range temporal dependencies, so an LSTM is used in this paper.

The overall setting of the architecture is explained in the following figure.

seq2seq

Every sentence ends with a special end-of-sentence symbol . The input representation of the sentence is "A", "B", "C", and the output representation is "W", "X", "Y", "Z", .

The architecture used in the paper uses two different LSTMs and works as follows.

Experiments

The dataset used for experiments is WMT’14 English to French dataset. The authors trained the models on a subset of 12M sentences consisting of 348M French words and and 304M English words.

As typical neural language models depend on a vector representation for each word, authors used a fixed vocabulary for both languages. Authors used 160K of the most frequent words for the source language and 80K of the most frequent words for the target language. Every out of vocabulary word was replaced with a special “UNK” token.

The observations of the experiments are described below:

Conclusion

In this paper, authors showed that a large deep LSTM with a limited vocabulary can outperform a standard SMT-based system whose vocabulary is unlimited on a large-scale MT task.