stat946w18/Unsupervised Machine Translation Using Monolingual Corpora Only
Introduction
Neural machine translation systems must be trained on large corpora consisting of pairs of pre-translated sentences. This paper proposes an unsupervised neural machine translation system, which can be trained without using any such parallel data.
Overview of translation system
The unsupervised translation system has four components:
- An unsupervised word-vector alignment system
- An encoder
- A decoder
- A discriminator
Word vector alignment
Methods like word2vec and GLOVE generate vectors in Euclidian space whose geometry corresponds to the semantics of the words they represent. For example, if f maps English words to vectors, then we have
[math]\displaystyle{ f(\text{king}) -f(\text{man}) +f(\text{woman}) \approx f(\text{queen}) }[/math]
Mikolov (2013), observed that this correspondence between semantics and geometry is invariant across languages. For example, if g maps French words to their corresponding vectors, then
[math]\displaystyle{ g(\text{roi}) -g(\text{homme}) +g(\text{femme}) \approx g(\text{reine}) }[/math]
It follows that if A maps English word vectors to the word vectors of their French translations, we should expect A to be linear.
Mikolov and many subsequent authors used this observation to devise methods for expanding small bilingual dictionaries. The main idea is that using given parallel data, we can solve for the linear transformation A by least squares. We can then use this learned linear transformation to translate arbitrary words, assuming we have the corresponding word vectors. See my CS698 project for details (link).
Overview of objective
The objective function is the sum of three terms:
- The de-noising auto-encoder loss
- The translation loss
- The adversarial loss
References
- Tomas Mikolov, Quoc V Le, and Ilya Sutskever. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013.