stat341f11

From statwiki
Jump to navigation Jump to search

Editor Sign Up

Sampling - Sept 20, 2011

The meaning of sampling is to generate data points or numbers such that these data follow a certain distribution.
i.e. From [math]\displaystyle{ x \sim~ f(x) }[/math] sample [math]\displaystyle{ \,x_{1}, x_{2}, ..., x_{1000} }[/math]

In practice, it maybe difficult to find the joint distribution of random variables. Through simulating the random variables, we can inference from the data.

Sample from uniform distribution.

Computers can't generate random numbers as they are deterministic but they can produce pseudo random numbers.

Multiplicative Congruential

  • involves three parameters: integers [math]\displaystyle{ \,a, b, m }[/math], and an initial value [math]\displaystyle{ \,x_0 }[/math] we call the seed
  • a sequence of integers is defined as
[math]\displaystyle{ x_{k+1} \equiv (ax_{k} + b) \mod{m} }[/math]

Example: [math]\displaystyle{ \,a=13, b=0, m=31, x_0=1 }[/math] creates a uniform histogram.

MATLAB code for generating 1000 randoms using the multiplicative congruential method:

a=13;
b=0;
m=31;
x(1)=1;

for ii = 1:1000
    x(ii+1) = mod(a*x(ii)+b,m);
end

MATLAB code for displaying the values of x generated:

x;

MATLAB code for plotting the histogram of x:

hist(x);

Facts about this algorithm:

  • In this example, the first 30 terms in the sequence are a permutation of integers from 1 to 30 then the sequence repeats itself.
  • Values are between 0 and [math]\displaystyle{ m-1 }[/math].
  • Dividing the numbers by [math]\displaystyle{ m-1 }[/math] yields numbers in the interval [math]\displaystyle{ [0,1) }[/math].
  • MATLAB's rand function once used this algorithm with [math]\displaystyle{ a=7^5, b=0, m=2^{31}-1 }[/math], for reasons described in Park and Miller's 1988 paper "Random Number Generators: Good Ones are Hard to Find" (available online).

Inverse Transform Method

[math]\displaystyle{ P(a\lt x\lt b)=\int_a^{b} f(x) dx }[/math]
[math]\displaystyle{ cdf=F(x)=P(X\lt =x)=\int_{-\infty}^{x} f(x) dx }[/math]

Assume cdf & cdf -1 can be found

Theorem:

Take [math]\displaystyle{ U \sim~ \mathrm{Unif}[0, 1] }[/math] and let [math]\displaystyle{ x=F^{-1}(u) }[/math]. Then x has distribution function [math]\displaystyle{ F() }[/math], where [math]\displaystyle{ F(x)=P(X\lt =x) }[/math].

Let [math]\displaystyle{ F^{-1}() }[/math] denote the inverse of [math]\displaystyle{ F() }[/math]. Therefore [math]\displaystyle{ F(x)=u \implies x=F^{-1}(u) }[/math]

Take the the exponential distribution for example

[math]\displaystyle{ \,f(x)={\lambda}e^{-{\lambda}x} }[/math]
[math]\displaystyle{ \,F(x)=\int_0^x {\lambda}e^{-{\lambda}u} du }[/math]
[math]\displaystyle{ \,F(x)=1-e^{-{\lambda}x} }[/math]

Let: [math]\displaystyle{ \,F(x)=y }[/math]

[math]\displaystyle{ \,y=1-e^{-{\lambda}x} }[/math]
[math]\displaystyle{ \,ln(1-y)={-{\lambda}x} }[/math]
[math]\displaystyle{ \,x=\frac{ln(1-y)}{-\lambda} }[/math]
[math]\displaystyle{ \,F^{-1}(x)=\frac{-ln(1-x)}{\lambda} }[/math]

Therefore, to get a exponential distribution from a uniform distribution takes 2 steps.

  • Step 1. Draw [math]\displaystyle{ U \sim~ \mathrm{Unif}[0, 1] }[/math]
  • Step 2. [math]\displaystyle{ x=\frac{-ln(1-U)}{\lambda} }[/math]

Now we just have to show the generated points have a cdf of F(x)

[math]\displaystyle{ \,p(F^{-1}(u)\lt =x) }[/math]
[math]\displaystyle{ \,p(F(F^{-1}(u))\lt =F(x)) }[/math]
[math]\displaystyle{ \,p(u\lt =F(x)) }[/math]
[math]\displaystyle{ \,=F(x) }[/math]

QED

Discrete Case

This same technique can be applied to the discrete case. Generate a discrete random variable [math]\displaystyle{ \,x }[/math] that has probability mass function [math]\displaystyle{ \,p(x=x_i)=p_i }[/math] where [math]\displaystyle{ \,x_0\lt x_1\lt x_2... }[/math] and [math]\displaystyle{ \,\sum p_i=1 }[/math]

  • Step 1. Draw [math]\displaystyle{ u \sim~ \mathrm{Unif}[0, 1] }[/math]
  • Step 2. [math]\displaystyle{ \,x=x_i }[/math] if [math]\displaystyle{ \,F(x_{i-1})\lt u\lt =F(x_i) }[/math]