Revision as of 17:37, 19 November 2018

Introduction

The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment between sentences or documents. Monoligual corpora are text corpora that is made up of one language only. This contrasts with the usual translation approach that uses parallel corpora, where two corpora are the direct translation of each other and the translations are aligned by words or sentences.

The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Conneau, 2017], then iteratively improve on the translation by utilizing 2 architectures:

A denoising auto-encoder to reconstruct noisy versions of sentences for both source and target languages.
A discriminator to align the distributions of the source and target languages in a latent space.

Background

Methodology

The objective function that proposed by the paper is a combination of 3 component objective functions:

Reconstruction loss of the denoising auto-encoder
Cross domain loss of the auto-encoder
Adversarial cross entropy loss of the discriminator

Noise Model:

The Noise model [math]\displaystyle{ C(x) }[/math] is a randomly sampled noisy version of sentence [math]\displaystyle{ x }[/math]. Noise is added through:

Randomly dropping each word in the sentence with probability [math]\displaystyle{ p_{wd} }[/math].
Slightly shuffling the words in the sentence where each word can be at most [math]\displaystyle{ k }[/math] positions away from its original position.

The authors found in practice [math]\displaystyle{ p_{wd}= 0.1 }[/math] and [math]\displaystyle{ k=3 }[/math] to be good parameters.

Reconstruction Loss

\begin{align} \mathcal{L}_{auto}(\theta_{enc}, \theta_{dec}, \mathcal{Z}, \ell) = E_{x\sim D_\ell, \hat{x}\sim d(e(C(x),\ell),\ell)}[\Delta(\hat{x},x)] \end{align}

Cross Domain Training

Adversarial Training

Critique

Other Sources

References

[Lample, 2018] Lample, G., Conneau, A., Ranzato, M., Denoyer, L., "Unsupervised Machine Translation Using Monolingual Corpora Only". arXiv:1711.00043

[Conneau, 2017] Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H., "Word Translation without Parallel Data". arXiv:1710.04087

@@ Line 16: / Line 16: @@
 # Cross domain loss of the auto-encoder
 # Adversarial cross entropy loss of the discriminator
+Noise Model:
+The Noise model <math>C(x)</math> is a randomly sampled noisy version of sentence <math>x</math>. Noise is added through:
+# Randomly dropping each word in the sentence with probability <math>p_{wd}</math>.
+# Slightly shuffling the words in the sentence where each word can be at most <math>k</math> positions away from its original position.
+The authors found in practice <math>p_{wd}= 0.1 </math> and <math>k=3</math> to be good parameters.
 ===Reconstruction Loss===
 \begin{align}

Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

Revision as of 17:37, 19 November 2018

Contents

Introduction

Background

Methodology

Reconstruction Loss

Cross Domain Training

Adversarial Training

Critique

Other Sources

References

Navigation menu

Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

Revision as of 17:37, 19 November 2018

Introduction

Background

Methodology

Reconstruction Loss

Cross Domain Training

Adversarial Training

Critique

Other Sources

References

Navigation menu

Search