stat946f15/Sequence to sequence learning with neural networks

From statwiki
Revision as of 12:25, 15 October 2015 by Rtwang (talk | contribs)
Jump to navigation Jump to search

Introduction

The emergence of the Internet and other modern technology has greatly increased people's ability to communicate across vast distances and barriers. However, there still remains the fundamental barrier of languages and as anyone who has attempted to learn a new language can attest, it takes tremendous amount of work to learn more than one language past childhood. The ability to efficiently and quickly translate between languages would then be of great importance. This is an extremely difficult problem however as languages can have varying grammar and context always plays an important role.

The purpose of the paper is then to apply multi-layer long short-term memory neural networks to this machine language translation problem and assess the accuracy in translation for this approach.

Model

Long Short-Term Memory Recurrent Neural Network

Recurrent neural networks are a variation of deep neural networks that are capable of storing information about previous inputs. Unlike feed forward neural networks that take in a single fixed length vector input and output a fixed length vector output, recurrent neural networks can take in a sequence of fixed length vectors as input because of their ability to store information and maintain a connection between inputs. Previous inputs would have no impact on current output for feed forward neural networks whereas they can impact current input in a recurrent neural network. Recurrent neural network gains this memory ability through the addition of memory layers that store information about previous hidden states which then gets introduced back with the current inputs.

This form of input fits naturally with language translation since sentences are sequences of words and many problems regarding representing variable length sentences as fixed length vectors can be avoided.

Input and Output Data Transformation

Error Scoring

Training and Results

Training Method

Results