stat946w18/Synthetic and natural noise both break neural machine translation
Introduction
- Humans have surprisingly robust language processing systems which can easily overcome typos, e.g.
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae.
- A person's ability to read this text comes as no surprise to the Psychology literature
- Saberi \& Perrott (1999) found that this robustness extends to audio as well.
- Rayner et al. (2006) found that in noisier settings reading comprehension only slowed by 11 \%.
- McCusker et al. (1981) found that the common case of swapping letters could often go unnoticed by the reader.
- Mayall et al (1997) shows that we rely on word shape.
- Reicher, 1969; Pelli et al., (2003) found that we can switch between whole word recognition but the first and last letter positions are required to stay constant for comprehension
However, NMT(neural machine translation) systems are brittle. i.e. The Arabic word means a blessing for good_morning, however means hunt or slaughter.
Facebook's MT system mistakenly confused two words that only differ by one character, a situation that is challenging for a character-based NMT system.
Figure 1 shows the performance translating German to English as a function of the percent of German words modified. Here we show two types of noise: (1) Random permutation of the word and (2) Swapping a pair of adjacent letters in the centre of words. The important thing to note is that even small amounts of noise lead to substantial drops in performance.
BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is". BLEU is between 0 and 1.
This paper explores two simple strategies for increasing model robustness:
- using structure-invariant representations ( character CNN representation)
- robust training on noisy data, a form of adversarial training.
Adversarial examples
The growing literature on adversarial examples has demonstrated how dangerous it can be to have brittle machine learning systems being used so pervasively in the real world.
The paper devises simple methods for generating adversarial examples for NMT. They do not assume any access to the NMT models' gradients, instead relying on cognitively-informed and naturally occurring language errors to generate noise.
MT system
We experiment with three different NMT systems with access to character information at different levels.
- Use the fully character-level model (Lee et al. 2017). This is a sequence to sequence model with attention that is trained on \code{char2char}.
- Use \code{Nematus} (Sennrich et al., 2017), a popular NMT toolkit. It is another sequence-to-sequence model with several architecture modifications, especially operating on sub-word units using byte-pair encoding.
- Use an attentional sequence-to-sequence model with a word representation based on a character convolutional neural network (\code{charCNN} ). The \code{charCNN} model has two long short-term memory (LSTM) layers in the encoder and the decoder.
DATA
MY DATA
We use the TED talks parallel corpus prepared for IWSLT 2016 (Cettolo et al., 2012) for testing all of the NMT systems.
NATURAL AND ARTIFICIAL NOISE
NATURAL NOISE
To three different languages French, German and Czech, they have their own frequent natural errors.
The author harvest naturally occurring errors (typos, misspellings, etc.) corresponding to these three languages from available corpora of edits to build a look-up table of possible lexical replacements.
Synthetic Noise
In addition to naturally collected sources of error, we also experiment with four types of synthetic noise: Swap, Middle Random, Fully Random, and Key Typo.
- The first and simplest source of noise is swapping two letters (do not alter the first or last letters).
- \code{Middle Random}: Randomize the order of all the letters in a word except for the first and last.
- \code{Fully Random} Completely randomized words.
- \code{Keyboard Typo}Randomly replace one letter in each word with an adjacent key
Table 3 shows BLEU scores of models trained on clean (Vanilla) texts and tested on clean and noisy texts. All models suffer a significant drop in BLEU when evaluated on noisy texts. This is true for both natural noise and all kinds of synthetic noise. The more noise in the text, the worse the translation quality, with random scrambling producing the lowest BLEU scores.