# Difference between revisions of "generating Random Numbers"

### Generating Random Numbers - May 12, 2009

Generating random numbers in a computational setting presents challenges. A good way to generate random numbers in computational statistics involves analyzing various distributions using computational methods. As a result, the probability distribution of each possible number appears to be uniform (pseudo-random). Outside a computational setting, presenting a uniform distribution is fairly easy (for example, rolling a fair die repetitively to produce a series of random numbers from 1 to 6).

We begin by considering the simplest case: the uniform distribution.

#### Multiplicative Congruential Method

One way to generate pseudo random numbers from the uniform distribution is using the Multiplicative Congruential Method. This involves three integer parameters a, b, and m, and a seed variable x0. This method deterministically generates a sequence of numbers (based on the seed) with a seemingly random distribution (with some caveats). It proceeds as follows:

$x_{i+1} = (ax_{i} + b) \mod{m}$

For example, with a = 13, b = 0, m = 31, x0 = 1, we have:

$x_{i+1} = 13x_{i} \mod{31}$

So,

\begin{align} x_{0} &{}= 1 \end{align}
\begin{align} x_{1} &{}= 13 \times 1 + 0 \mod{31} = 13 \\ \end{align}
\begin{align} x_{2} &{}= 13 \times 13 + 0 \mod{31} = 14 \\ \end{align}
\begin{align} x_{3} &{}= 13 \times 14 + 0 \mod{31} =27 \\ \end{align}

etc.

The above generator of pseudorandom numbers is called a Mixed Congruential Generator or Linear Congruential Generator, as they involve both an additive and a muliplicative term. For correctly chosen values of a, b, and m, this method will generate a sequence of integers including all integers between 0 and m - 1. Scaling the output by dividing the terms of the resulting sequence by m - 1, we create a sequence of numbers between 0 and 1, which is similar to sampling from a uniform distribution.

Of course, not all values of a, b, and m will behave in this way, and will not be suitable for use in generating pseudo random numbers.

For example, with a = 3, b = 2, m = 4, x0 = 1, we have:

$x_{i+1} = (3x_{i} + 2) \mod{4}$

So,

\begin{align} x_{0} &{}= 1 \end{align}
\begin{align} x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ \end{align}
\begin{align} x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ \end{align}

etc.

For an ideal situation, we want m to be a very large prime number, $x_{n}\not= 0$ for any n, and the period is equal to m-1. In practice, it has been found by a paper published in 1988 by Park and Miller, that a = 75, b = 0, and m = 231 - 1 = 2147483647 (the maximum size of a signed integer in a 32-bit system) are good values for the Multiplicative Congruential Method.

Java's Random class is based on a generator with a = 25214903917, b = 11, and m = 248<ref>http://java.sun.com/javase/6/docs/api/java/util/Random.html#next(int)</ref>. The class returns at most 32 leading bits from each xi, so it is possible to get the same value twice in a row, (when x0 = 18698324575379, for instance) without repeating it forever.

#### General Methods

Since the Multiplicative Congruential Method can only be used for the uniform distribution, other methods must be developed in order to generate pseudo random numbers from other distributions.

##### Inverse Transform Method

This method uses the fact that when a random sample from the uniform distribution is applied to the inverse of a cumulative density function (cdf) of some distribution, the result is a random sample of that distribution. This is shown by this theorem:

Theorem:

If $U \sim~ \mathrm{Unif}[0, 1]$ is a random variable and $X = F^{-1}(U)$, where F is continuous, monotonic, and is the cumulative density function (cdf) for some distribution, then the distribution of the random variable X is given by F(X).

Proof:

Recall that, if f is the pdf corresponding to F, then,

$F(x) = P(X \leq x) = \int_{-\infty}^x f(x)$

So F is monotonically increasing, since the probability that X is less than a greater number must be greater than the probability that X is less than a lesser number.

Note also that in the uniform distribution on [0, 1], we have for all a within [0, 1], $P(U \leq a) = a$.

So,

\begin{align} P(F^{-1}(U) \leq x) &{}= P(F(F^{-1}(U)) \leq F(x)) \\ &{}= P(U \leq F(x)) \\ &{}= F(x) \end{align}

Completing the proof.

Procedure (Continuous Case)

This method then gives us the following procedure for finding pseudo random numbers from a continuous distribution:

• Step 1: Draw $U \sim~ Unif [0, 1]$.
• Step 2: Compute $X = F^{-1}(U)$.

Example:

Suppose we want to draw a sample from $f(x) = \lambda e^{-\lambda x}$ where $x \gt 0$ (the exponential distribution).

We need to first find $F(x)$ and then its inverse, $F^{-1}$.

$F(x) = \int^x_0 \theta e^{-\theta u} du = 1 - e^{-\theta x}$
$F^{-1}(x) = \frac{-\log(1-y)}{\theta} = \frac{-\log(u)}{\theta}$

Now we can generate our random sample $i=1\dots n$ from $f(x)$ by:

$1)\ u_i \sim Unif[0, 1]$
$2)\ x_i = \frac{-\log(u_i)}{\theta}$

The $x_i$ are now a random sample from $f(x)$.

This example can be illustrated in Matlab using the code below. Generate $u_i$, calculate $x_i$ using the above formula and letting $\theta=1$, plot the histogram of $x_i$'s for $i=1,...,100,000$.

u=rand(1,100000);
x=-log(1-u)/1;
hist(x)


The histogram shows exponential pattern as expected.

The major problem with this approach is that we have to find $F^{-1}$ and for many distributions it is too difficult (or impossible) to find the inverse of $F(x)$. Further, for some distributions it is not even possible to find $F(x)$ (i.e. a closed form expression for the distribution function, or otherwise; even if the closed form expression exists, it's usually difficult to find $F^{-1}$).

Procedure (Discrete Case)

The above method can be easily adapted to work on discrete distributions as well.

In general in the discrete case, we have $x_0, \dots , x_n$ where:

\begin{align}P(X = x_i) &{}= p_i \end{align}
$x_0 \leq x_1 \leq x_2 \dots \leq x_n$
$\sum p_i = 1$

Thus we can define the following method to find pseudo random numbers in the discrete case (note that the less-than signs from class have been changed to less-than-or-equal-to signs by me, since otherwise the case of $U = 1$ is missed):

• Step 1: Draw $U~ \sim~ Unif [0,1]$.
• Step 2:
• If $U \lt p_0$, return $X = x_0$
• If $p_0 \leq U \lt p_0 + p_1$, return $X = x_1$
• ...
• In general, if $p_0+ p_1 + \dots + p_{k-1} \leq U \lt p_0 + \dots + p_k$, return $X = x_k$

Example (from class):

Suppose we have the following discrete distribution:

\begin{align} P(X = 0) &{}= 0.3 \\ P(X = 1) &{}= 0.2 \\ P(X = 2) &{}= 0.5 \end{align}

The cumulative density function (cdf) for this distribution is then:

$F(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ 0.3, & \text{if } 0 \leq x \lt 1 \\ 0.5, & \text{if } 1 \leq x \lt 2 \\ 1, & \text{if } 2 \leq x \end{cases}$

Then we can generate numbers from this distribution like this, given $u_0, \dots, u_n$ from $U \sim~ Unif[0, 1]$:

$x_i = \begin{cases} 0, & \text{if } u_i \leq 0.3 \\ 1, & \text{if } 0.3 \lt u_i \leq 0.5 \\ 2, & \text{if } 0.5 \lt u_i \leq 1 \end{cases}$

This example can be illustrated in Matlab using the code below:

p=[0.3,0.2,0.5];
for i=1:1000;
u=rand;
if u <= p(1)
x(i)=0;
elseif u < sum(p(1,2))
x(i)=1;
else
x(i)=2;
end
end