Unsupervised Machine Translation Using Monolingual Corpora Only: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 2: Line 2:
The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment between sentences or documents. Monoligual corpora are text corpora that is made up of one language only. This contrasts with the usual translation approach that uses parallel corpora, where two corpora are the direct translation of each other and the translations are aligned by words or sentences.
The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment between sentences or documents. Monoligual corpora are text corpora that is made up of one language only. This contrasts with the usual translation approach that uses parallel corpora, where two corpora are the direct translation of each other and the translations are aligned by words or sentences.


The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Conneau, 2017], then iteratively improve on the translation by utilize 2 architectures:
The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Conneau, 2017], then iteratively improve on the translation by utilizing 2 architectures:


# A denoising auto-encoder to deconstruct noisy versions of sentences for both source and target languages
# A denoising auto-encoder to reconstruct noisy versions of sentences for both source and target languages.
# A discriminator to align the distributions of the source and target languages in a latent space.
# A discriminator to align the distributions of the source and target languages in a latent space.



Revision as of 16:10, 19 November 2018

Introduction

The paper presents an unsupervised method to machine translation using only monoligual corpora without any alignment between sentences or documents. Monoligual corpora are text corpora that is made up of one language only. This contrasts with the usual translation approach that uses parallel corpora, where two corpora are the direct translation of each other and the translations are aligned by words or sentences.

The general approach of the methodology is to first use a unsupervised word-by-word translation model proposed by [Conneau, 2017], then iteratively improve on the translation by utilizing 2 architectures:

  1. A denoising auto-encoder to reconstruct noisy versions of sentences for both source and target languages.
  2. A discriminator to align the distributions of the source and target languages in a latent space.

Methodology

Background

Critique


Other Sources

References

  1. [Conneau, 2017] Conneau, Alexis, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou. "Word Translation without Parallel Data". arXiv:1710.04087