# continuous space language models

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

# Introduction

In certain fields of study such as speech recognition or machine translation, for some acoustic signal $\,x$ or the source sentence to be translated $\,e$, it is common to model these problems as finding the sequence of words $\,w^*$ that has the highest probability of occurring given $\,x$ or $\,e$. This can be written as:

$w^* = arg\ \underset {w}{max} P(w|x) = arg\ \underset{w}{max} P(x|w)P(w)$

An acoustic or translation model can then be used for $\,P(x|w)$, similar to the idea behind LDA and QDA, and it remains to create a language model $\,P(w)$ to estimate the probability of any sequence of words $\,w$.

This is commonly done through the back-off n-grams model and the purpose behind this research paper is to use a neural network to better estimate $\,P(w)$.

# Back-off n-grams Model

A sequence of words will be defined as $\,w^i_1=(w_1,w_2,\dots,w_i)$ and the formula for the probability $\,P(w)$ can be rewritten as:

$P(w^n_1)=P(w_1,w_2,\dots,w_n)=P(w_1)\prod_{i=2}^n P(w_i|w^{i-1}_1)$

It is common to estimate $\,P(w_i|w^{i-1}_1)$ through:

$\,P(w_i|w^{i-1}_1)\approx\frac{number\ of\ occurrence\ of\ the\ sequence\ (w_1,\dots,w_i)}{number\ of\ occurrence\ of\ the\ sequence\ (w_1,\dots,w_{i-1})}$

However, it is practically impossible to have a training set large enough to contain every possible sequence of words if the sequence is long enough and some sequences would have an incorrect probability of 0 simply because it is not in the training set. This is known as the data sparseness problem. This problem is commonly resolved by considering only the last n-1 words instead of the whole context. However, even for small n, certain sequences could still be missing.

To solve this issue, a technique called back-off n-grams is used and the general formula goes as follows:

$\,P(w_i|w^{i-1}_1) = \begin{cases} n/2, & \mbox{if }n\mbox{ is even} \\ 3n+1, & \mbox{if }n\mbox{ is odd} \end{cases}$