# Difference between revisions of "stat341 / CM 361"

Computational Statistics and Data Analysis is a course offered at the University of Waterloo
Spring 2009
Instructor: Ali Ghodsi

## Sampling (Generating Random numbers)

### Lecture of May 12th, 2009

In order to study statistics computationally, we need a good way to generate random numbers from various distributions using computational methods, or at least numbers whose distribution appears to be random (pseudo-random). Outside a computational setting, this is fairly easy (at least for the uniform distribution). Rolling a die, for example, produces numbers with a uniform distribution very well.

We begin by considering the simplest case: the uniform distribution.

#### Multiplicative Congruential Method

One way to generate pseudorandom numbers from the uniform distribution is using the Multiplicative Congruential Method. This involves three integer parameters a, b, and n, and a seed variable x0. This method deterministically (based on the seed) generates a sequence of numbers with a seemingly random distribution (with some caveats). It proceeds as follows:

$x_{i+1} = ax_{i} + b \mod{n}$

For example, with a = 13, b = 0, m = 31, x0 = 1, we have:

$x_{i+1} = 13x_{i} \mod{31}$

So,

\begin{align} x_{0} &{}= 1 \end{align}
\begin{align} x_{1} &{}= 13*1 + 0 \mod{31} \\ &{}= 13 \end{align}
\begin{align} x_{2} &{}= 13*13 + 0 \mod{31} \\ &{}= 14 \end{align}

etc.

This method generates numbers between 0 and m - 1, and by scaling this output by dividing the terms of the resulting sequence by m - 1, we create a sequence of numbers between 0 and 1. For correctly chosen values of a, b, and m, this method will generate a sequence of integers including all integers between 0 and m - 1.

Of course, not all values of a, b, and m will behave in this way, and will not be suitable for use in generating pseudorandom numbers. In practice, it has been found by a paper published in 1988 by Park and Miller, that a = 75, b = 0, and m = 231 - 1 = 2147483647 (the maximum size of a signed integer in a 32-bit system) are good values for the Multiplicative Congruential Method.

There are however some drawbacks to this method:

• No integer will ever be generated twice in a row (otherwise the method would generate that integer forever from that point onwards)
• Can only be used to generate pseudorandom numbers from the uniform distribution

#### General Methods

Since the Multiplicative Congruential Method can only be used for the uniform distribution, other methods must be developed in order to generate pseudorandom numbers from other distributions.

##### Inverse Transform Method

This method uses the fact that when the inverse of a distribution cumulative density function for a given distribution is applied to the uniform distribution, the resulting distribution is the distribution of the cdf. This is shown by this theorem:

Theorem:

If $U \sim~ Unif[0, 1]$ is a random variable and $X = F^{-1}(U)$, where F is continuous, and is the cumulative density function for some distribution, then the distribution of the random variable X is given by F(X).

Proof:

Recall that, if f is the pdf corresponding to F, then,

$F(x) = P(X \leq x) = \int_{-\infty}^x f(x)$

So F is monotonically increasing, since the probability that X is less than a greater number must be greater than the probability that X is less than a lesser number.

Note also that in the uniform distribution on [0, 1], we have for all a within [0, 1], $P(U \leq a) = a$.

So,

\begin{align} P(F^{-1}(U) \leq x) &{}= P(F(F{-1}(U)) \leq F(x)) \\ &{}= P(U \leq F(x)) \\ &{}= F(x) \end{align}

Completing the proof.

Procedure (continuous case)

This method then gives us the following procedure for finding pseudorandom numbers from a continuous distribution:

• Step 1: Draw $U~ \sim~ Unif [0,1]$.
• Step 2: Compute $X = F^{-1}(U)$.

Example:

Suppose we want to draw a sample from $f(x) = \lambda e^{-\lambda x}$ where $x \gt 0$ (the exponential distribution).

We need to first find $F(x)$ and then its inverse, $F^{-1}$.

$F(x) = \int^x_0 \theta e^{-\theta u} du = 1 - e^{-\theta x}$
$F^{-1}(x) = \frac{-log(1-y)}{\theta}$

Now we can generate our random sample $i=1\dots n$ from $f(x)$ by:

$1)\ u_i \sim UNIF(0,1)$
$2)\ x_i = \frac{-log(1-u_i)}{\theta}$

The $x_i$ are now a random sample from $f(x)$.

The major problem with this approach is that we have to find $F^{-1}$ and for many distributions it is too difficult (or impossible) to find the inverse of $F(x)$. Further, for some distributions it is not even possible to find $F(x)$

Discrete Case

The above method can be easily adapted to work on discrete distributions as well.

In general in the discrete case, we have $x_0, \dots , x_n$ where:

\begin{align}P(X = x_i) &{}= p_i \end{align}
$x_0 \leq x_1 \leq x_2 \dots \leq x_n$
$\sum p_i = 1$