# Difference between revisions of "stat341 / CM 361"

(→Sampling (Generating Random numbers)) |
|||

Line 27: | Line 27: | ||

:<math>\begin{align} x_{0} &{}= 1 \end{align}</math> | :<math>\begin{align} x_{0} &{}= 1 \end{align}</math> | ||

:<math>\begin{align} | :<math>\begin{align} | ||

− | x_{1} &{}= 13 | + | x_{1} &{}= 13 \times 1 + 0 \mod{31} \\ |

&{}= 13 | &{}= 13 | ||

\end{align}</math> | \end{align}</math> | ||

:<math>\begin{align} | :<math>\begin{align} | ||

− | x_{2} &{}= 13 | + | x_{2} &{}= 13 \times 13 + 0 \mod{31} \\ |

&{}= 14 | &{}= 14 | ||

\end{align}</math> | \end{align}</math> |

## Revision as of 16:36, 13 May 2009

**Computational Statistics and Data Analysis** is a course offered at the University of Waterloo

Spring 2009

Instructor: Ali Ghodsi

## Contents

## Sampling (Generating Random numbers)

### Lecture of May 12th, 2009

In order to study statistics computationally, we need a good way to generate random numbers from various distributions using computational methods, or at least numbers whose distribution appears to be random (pseudo-random). Outside a computational setting, this is fairly easy (at least for the uniform distribution). Rolling a die, for example, produces numbers with a uniform distribution very well.

We begin by considering the simplest case: the uniform distribution.

#### Multiplicative Congruential Method

One way to generate pseudorandom numbers from the uniform distribution is using the **Multiplicative Congruential Method**. This involves three integer parameters *a*, *b*, and *n*, and a **seed** variable *x _{0}*. This method deterministically (based on the seed) generates a sequence of numbers with a seemingly random distribution (with some caveats). It proceeds as follows:

- [math]x_{i+1} = ax_{i} + b \mod{n}[/math]

For example, with *a* = 13, *b* = 0, *m* = 31, *x _{0}* = 1, we have:

- [math]x_{i+1} = 13x_{i} \mod{31}[/math]

So,

- [math]\begin{align} x_{0} &{}= 1 \end{align}[/math]
- [math]\begin{align} x_{1} &{}= 13 \times 1 + 0 \mod{31} \\ &{}= 13 \end{align}[/math]
- [math]\begin{align} x_{2} &{}= 13 \times 13 + 0 \mod{31} \\ &{}= 14 \end{align}[/math]

etc.

This method generates numbers between 0 and *m* - 1, and by scaling this output by dividing the terms of the resulting sequence by *m - 1*, we create a sequence of numbers between 0 and 1. For correctly chosen values of *a*, *b*, and *m*, this method will generate a sequence of integers including all integers between 0 and *m* - 1.

Of course, not all values of *a*, *b*, and *m* will behave in this way, and will not be suitable for use in generating pseudorandom numbers. In practice, it has been found by a paper published in 1988 by Park and Miller, that *a* = 7^{5}, *b* = 0, and *m* = 2^{31} - 1 = 2147483647 (the maximum size of a signed integer in a 32-bit system) are good values for the Multiplicative Congruential Method.

There are however some drawbacks to this method:

- No integer will ever be generated twice in a row (otherwise the method would generate that integer forever from that point onwards)
- Can only be used to generate pseudorandom numbers from the uniform distribution

#### General Methods

Since the Multiplicative Congruential Method can only be used for the uniform distribution, other methods must be developed in order to generate pseudorandom numbers from other distributions.

##### Inverse Transform Method

This method uses the fact that when the inverse of a distribution cumulative density function for a given distribution is applied to the uniform distribution, the resulting distribution is the distribution of the cdf. This is shown by this theorem:

**Theorem**:

If [math]U \sim~ Unif[0, 1][/math] is a random variable and [math]X = F^{-1}(U)[/math], where F is continuous, and is the cumulative density function for some distribution, then the distribution of the random variable X is given by F(X).

**Proof**:

Recall that, if *f* is the pdf corresponding to F, then,

- [math]F(x) = P(X \leq x) = \int_{-\infty}^x f(x)[/math]

So F is monotonically increasing, since the probability that X is less than a greater number must be greater than the probability that X is less than a lesser number.

Note also that in the uniform distribution on [0, 1], we have for all *a* within [0, 1], [math]P(U \leq a) = a[/math].

So,

- [math]\begin{align} P(F^{-1}(U) \leq x) &{}= P(F(F{-1}(U)) \leq F(x)) \\ &{}= P(U \leq F(x)) \\ &{}= F(x) \end{align}[/math]

Completing the proof.

**Procedure (Continuous Case)**

This method then gives us the following procedure for finding pseudorandom numbers from a continuous distribution:

- Step 1: Draw [math] U~ \sim~ Unif [0,1] [/math].
- Step 2: Compute [math] X = F^{-1}(U) [/math].

**Example**:

Suppose we want to draw a sample from [math]f(x) = \lambda e^{-\lambda x} [/math] where [math]x \gt 0[/math] (the exponential distribution).

We need to first find [math]F(x)[/math] and then its inverse, [math]F^{-1}[/math].

- [math] F(x) = \int^x_0 \theta e^{-\theta u} du = 1 - e^{-\theta x} [/math]

- [math] F^{-1}(x) = \frac{-log(1-y)}{\theta} [/math]

Now we can generate our random sample [math]i=1\dots n[/math] from [math]f(x)[/math] by:

- [math]1)\ u_i \sim UNIF(0,1)[/math]
- [math]2)\ x_i = \frac{-log(1-u_i)}{\theta}[/math]

The [math]x_i[/math] are now a random sample from [math]f(x)[/math].

The major problem with this approach is that we have to find [math]F^{-1}[/math] and for many distributions it is too difficult (or impossible) to find the inverse of [math]F(x)[/math]. Further, for some distributions it is not even possible to find [math]F(x)[/math]

**Procedure (Discrete Case)**

The above method can be easily adapted to work on discrete distributions as well.

In general in the discrete case, we have [math]x_0, \dots , x_n[/math] where:

- [math]\begin{align}P(X = x_i) &{}= p_i \end{align}[/math]
- [math]x_0 \leq x_1 \leq x_2 \dots \leq x_n[/math]
- [math]\sum p_i = 1[/math]

Thus we can define the following method to find pseudorandom numbers in the discrete case:

- Step 1: Draw [math] U~ \sim~ Unif [0,1] [/math].
- Step 2:
- If [math]U \lt p_0[/math], return [math]X = x_0[/math]
- If [math]U \lt p_0 + p_1[/math], return [math]X = x_1[/math]
- ...
- In general, if [math]U \lt p_0 + \dots + p_k[/math], return [math]X = x_k[/math]

**Example**:

Suppose we have the following discrete distribution:

- [math]\begin{align} P(X = 0) &{}= 0.3 \\ P(X = 1) &{}= 0.2 \\ P(X = 2) &{}= 0.5 \end{align}[/math]

The cumulative density function for this distribution is then:

- [math] F(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ 0.3, & \text{if } 0 \leq x \lt 1 \\ 0.5, & \text{if } 1 \leq x \lt 2 \\ 1, & \text{if } 2 \leq x \end{cases}[/math]