stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

Revision as of 17:16, 18 February 2018

Introduction

Neural machine translation systems must be trained on large corpora consisting of pairs of pre-translated sentences. This paper proposes an unsupervised neural machine translation system, which can be trained without using any such parallel data.

Overview of translation system

The unsupervised translation system has four components:

An unsupervised word-vector alignment system
An encoder
A decoder
A discriminator

Word vector alignment

Methods like word2vec and GLOVE generate vectors in Euclidian space whose geometry corresponds to the semantics of the words they represent. For example, if f maps English words to vectors, then we have

[math]\displaystyle{ f(\text{king}) -f(\text{man}) +f(\text{woman}) \approx f(\text{queen}) }[/math]

Mikolov (2013), observed that this correspondence between semantics and geometry is invariant across languages. For example, if g maps French words to their corresponding vectors, then

[math]\displaystyle{ g(\text{roi}) -g(\text{homme}) +g(\text{femme}) \approx g(\text{reine}) }[/math]

It follows that if A maps English word vectors to the word vectors of their French translations, we should expect A to be linear.

Mikolov and many subsequent authors used this observation to devise methods for expanding small bilingual dictionaries. The main idea is that using given a small amount of parallel data, we can solve for the linear transformation A by least squares. We can then use this learned linear transformation to translate arbitrary words, assuming we have the corresponding word vectors. See my CS698 project for details (link). Essentially these schemes can be though of as methods for aligning the point cloud of source-language word-vectors with the point cloud of target language word-vectors. This work culminated recently with the paper of Conneau et al. (2017), which uses a GAN to align the word vectors of a source and target language using only considerations of point cloud geometry, and without relying on any bilingual dictionary. The present paper uses this unsupervised cross-lingual word-vector alignment scheme for two purposes: a) to initialize the unsupervised translation scheme; and b) to embed the word vectors of the two languages in the same space, so that a single encoder and decoder can be applied to both languages,

Overview of objective

The objective function is the sum of three terms:

The de-noising auto-encoder loss
The translation loss
The adversarial loss

References

Tomas Mikolov, Quoc V Le, and Ilya Sutskever. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013.
Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou. Word Translation without Parallel Data. arXiv:1710.04087, 2017

@@ Line 27: / Line 27: @@
 It follows that if A maps English word vectors to the word vectors of their French translations, we should expect A to be linear.
-Mikolov and many subsequent authors used this observation to devise methods for expanding small bilingual dictionaries.  The main idea is that using given a small amount of parallel data, we can solve for the linear transformation A by least squares.  We can then use this learned linear transformation to translate arbitrary words, assuming we have the corresponding word vectors.  See my CS698 project for details ([https://uwaterloo.ca/scholar/sites/ca.scholar/files/pa2forsy/files/project_dec_3_0.pdf link]). Essentially these schemes can be though of as methods for aligning the point cloud of source-language word-vectors with the point cloud of target language word-vectors.  This work culminated recently with the paper of Conneau et al. (2017), which uses a GAN to align the word vectors of a source and target language using only considerations of point cloud geometry, and without relying on any bilingual dictionary.
+Mikolov and many subsequent authors used this observation to devise methods for expanding small bilingual dictionaries.  The main idea is that using given a small amount of parallel data, we can solve for the linear transformation A by least squares.  We can then use this learned linear transformation to translate arbitrary words, assuming we have the corresponding word vectors.  See my CS698 project for details ([https://uwaterloo.ca/scholar/sites/ca.scholar/files/pa2forsy/files/project_dec_3_0.pdf link]). Essentially these schemes can be though of as methods for aligning the point cloud of source-language word-vectors with the point cloud of target language word-vectors.  This work culminated recently with the paper of Conneau et al. (2017), which uses a GAN to align the word vectors of a source and target language using only considerations of point cloud geometry, and without relying on any bilingual dictionary. The present paper uses this unsupervised cross-lingual word-vector alignment scheme for two purposes: a) to initialize the unsupervised translation scheme; and b) to embed the word vectors of the two languages in the same space, so that a single encoder and decoder can be applied to both languages,
 ==Overview of objective ==

stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

Revision as of 17:16, 18 February 2018

Contents

Introduction

Overview of translation system

Word vector alignment

Overview of objective

References

Navigation menu

stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

Revision as of 17:16, 18 February 2018

Introduction

Overview of translation system

Word vector alignment

Overview of objective

References

Navigation menu

Search