# Difference between revisions of "stat340s13"

If you use ideas, plots, text, code and other intellectual property developed by someone else in your wikicoursenote' contribution , you have to cite the original source. If you copy a sentence or a paragraph from work done by someone else, in addition to citing the original source you have to use quotation marks to identify the scope of the copied material. Evidence of copying or plagiarism will cause a failing mark in the course.

Example of citing the original source

Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref>

## Important Notes

To make distinction between the material covered in class and additional material that you have add to the course, use the following convention. For anything that is not covered in the lecture write:

In the news recently was a story that captures some of the ideas behind PCA. Over the past two years, Scott Golder and Michael Macy, researchers from Cornell University, collected 509 million Twitter messages from 2.4 million users in 84 different countries. The data they used were words collected at various times of day and they classified the data into two different categories: positive emotion words and negative emotion words. Then, they were able to study this new data to evaluate subjects' moods at different times of day, while the subjects were in different parts of the world. They found that the subjects generally exhibited positive emotions in the mornings and late evenings, and negative emotions mid-day. They were able to "project their data onto a smaller dimensional space" using PCS. Their paper, "Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures," is available in the journal Science.<ref>http://www.pcworld.com/article/240831/twitter_analysis_reveals_global_human_moodiness.html</ref>.

Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref>

## Introduction, Class 1 - Tuesday, May 7

### Course Instructor: Ali Ghodsi

Lecture:
001: T/Th 8:30-9:50am MC1085
002: T/Th 1:00-2:20pm DC1351
Tutorial:
2:30-3:20pm Mon M3 1006
Office Hours:
Friday at 10am, M3 4208

### Midterm

Monday June 17,2013 from 2:30pm-3:20pm

### Final

Saturday August 10,2013 from 7:30pm-10:00pm

### TA(s):

TA Day Time Location
Lu Cheng Monday 3:30-5:30 pm M3 3108, space 2
Han ShengSun Tuesday 4:00-6:00 pm M3 3108, space 2
Yizhou Fang Wednesday 1:00-3:00 pm M3 3108, space 1
Huan Cheng Thursday 3:00-5:00 pm M3 3111, space 1
Wu Lin Friday 11:00-1:00 pm M3 3108, space 1

### Four Fundamental Problems

1 Classification: Given input object X, we have a function which will take this input X and identify which 'class (Y)' it belongs to (Discrete Case)

  i.e taking value from x, we could predict y.


(For example, if you have 40 images of oranges and 60 images of apples (represented by x), you can estimate a function that takes the images and states what type of fruit it is - note Y is discrete in this case.)
2 Regression: Same as classification but in the continuous case except y is non discrete. Results from regression are often used for prediction,forecasting and etc. (Example of stock prices, height, weight, etc.)
(A simple practice might be investigating the hypothesis that higher levels of education cause higher levels of income.)
3 Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown; For example, clustering by provinces to measure average height of Canadian men.)
4 Dimensionality Reduction (also known as Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension

### Applications

Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity
Examples:

• Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning
• Search and recommendation (eg. Google, Amazon)
• Automatic speech recognition, speaker verification
• Text parsing
• Face identification
• Tracking objects in video
• Financial prediction(e.g. credit cards)
• Fraud detection
• Medical diagnosis

### Course Information

Prerequisite: (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)

Antirequisite: CM 361/STAT 341, CS 437, 457

General Information

• No required textbook
• Recommended: "Simulation" by Sheldon M. Ross
• Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)
• First midterm will be held on Monday, June 17 from 2:30 to 3:30
• Announcements and assignments will be posted on Learn.
• Other course material on: http://wikicoursenote.com/wiki/
• Log on to both Learn and wikicoursenote frequently.
• Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to their personal accounts!

Wikicourse note (complete at least 12 contributions to get 10% of final mark): When applying for an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will be used to identify the students who make the contributions. Example:
User: questid
Email: questid@uwaterloo.ca
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account.

As a technical/editorial contributor: Make contributions within 1 week and do not copy the notes on the blackboard.

All contributions are now considered general contributions you must contribute to 50% of lectures for full marks

• A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc.) but at least half of your contributions should be technical for full marks.

Do not submit copyrighted work without permission, cite original sources. Each time you make a contribution, check mark the table. Marks are calculated on an honour system, although there will be random verifications. If you are caught claiming to contribute but have not, you will not be credited.

- you can submit your contributions multiple times.
- you will be able to edit the response right after submitting
- send email to make changes to an old response : uwstat340@gmail.com

### Tentative Topics

- Random variable and stochastic process generation
- Discrete-Event Systems
- Variance reduction
- Markov Chain Monte Carlo

## Class 2 - Thursday, May 9

### Generating Random Numbers

#### Introduction

Simulation is the imitation of a process or system over time. Computational power has introduced the possibility of using simulation study to analyze models used to describe a situation.

In order to perform a simulation study, we should: <br\> 1 Use a computer to generate (pseudo*) random numbers (rand in MATLAB).
2 Use these numbers to generate values of random variable from distributions: for example, set a variable in terms of uniform u ~ U(0,1).
3 Using the concept of discrete events, we show how the random variables can be used to generate the behavior of a stochastic model over time. (Note: A stochastic model is the opposite of deterministic model, where there are several directions the process can evolve to)
4 After continually generating the behavior of the system, we can obtain estimators and other quantities of interest.

The building block of a simulation study is the ability to generate a random number. This random number is a value from a random variable distributed uniformly on (0,1). There are many different methods of generating a random number:

Physical Method: Roulette wheel, lottery balls, dice rolling, card shuffling etc.
Numerically/Arithmetically: Use of a computer to successively generate pseudorandom numbers. The sequence of numbers can appear to be random; however they are deterministically calculated with an equation which defines pseudorandom.


(Source: Ross, Sheldon M., and Sheldon M. Ross. Simulation. San Diego: Academic, 1997. Print.)

• We use the prefix pseudo because computer generates random numbers based on algorithms, which suggests that generated numbers are not truly random. Therefore pseudo-random numbers is used.

In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a stochastic model which encapsulates randomness and probabilistic events.
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate Pseudo Random Numbers

Pseudo Random Numbers are the numbers that seem random but are actually determined by a relative set of original values. It is a chain of numbers pre-set by a formula or an algorithm, and the value jump from one to the next, making it look like a series of independent random events. The flaw of this method is that, eventually the chain returns to its initial position and pattern starts to repeat, but if we make the number set large enough we can prevent the numbers from repeating too early. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.

When people repeat the test many times, the results will be the closed express values, which make the trials look deterministic. However, for each trial, the result is random. So, it looks like pseudo random numbers.

#### Mod

Let $n \in \N$ and $m \in \N^+$, then by Division Algorithm, $\exists q, \, r \in \N \;\text{with}\; 0\leq r \lt m, \; \text{s.t.}\; n = mq+r$, where $q$ is called the quotient and $r$ the remainder. Hence we can define a binary function $\mod : \N \times \N^+ \rightarrow \N$ given by $r:=n \mod m$ which returns the remainder after division by m.
Generally, mod means taking the reminder after division by m.
We say that n is congruent to r mod m if n = mq + r, where m is an integer. Values are between 0 and m-1
if y = ax + b, then $b:=y \mod a$.

Example 1:

$30 = 4 \cdot 7 + 2$

$2 := 30\mod 7$

$25 = 8 \cdot 3 + 1$

$1: = 25\mod 3$

$-3=5\cdot (-1)+2$

$2:=-3\mod 5$

Example 2:

If $23 = 3 \cdot 6 + 5$

Then equivalently, $5 := 23\mod 6$

If $31 = 31 \cdot 1$

Then equivalently, $0 := 31\mod 31$

If $-37 = 40\cdot (-1)+ 3$

Then equivalently, $3 := -37\mod 40$

Example 3:
$77 = 3 \cdot 25 + 2$

$2 := 77\mod 3$

$25 = 25 \cdot 1 + 0$

$0: = 25\mod 25$

Note: $\mod$ here is different from the modulo congruence relation in $\Z_m$, which is an equivalence relation instead of a function.

The modulo operation is useful for determining if an integer divided by another integer produces a non-zero remainder. But both integers should satisfy $n = mq + r$, where $m$, $r$, $q$, and $n$ are all integers, and $r$ is smaller than $m$. The above rules also satisfy when any of $m$, $r$, $q$, and $n$ is negative integer, see the third example.

#### Mixed Congruential Algorithm

We define the Linear Congruential Method to be $x_{k+1}=(ax_k + b) \mod m$, where $x_k, a, b, m \in \N, \;\text{with}\; a, m \neq 0$. Given a seed (i.e. an initial value $x_0 \in \N$), we can obtain values for $x_1, \, x_2, \, \cdots, x_n$ inductively. The Multiplicative Congruential Method, invented by Berkeley professor D. H. Lehmer, may also refer to the special case where $b=0$ and the Mixed Congruential Method is case where $b \neq 0$
. Their title as "mixed" arises from the fact that it has both a multiplicative and additive term.

An interesting fact about Linear Congruential Method is that it is one of the oldest and best-known pseudo random number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications that require high randomness. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)

"Source: STAT 340 Spring 2010 Course Notes"

First consider the following algorithm
$x_{k+1}=x_{k} \mod m$

such that: if $x_{0}=5(mod 150)$, $x_{n}=3x_{n-1}$, find $x_{1},x_{8},x_{9}$.
$x_{n}=(3^n)*5(mod 150)$
$x_{1}=45,x_{8}=105,x_{9}=15$

Example
$\text{Let }x_{0}=10,\,m=3$

\begin{align} x_{1} &{}= 10 &{}\mod{3} = 1 \\ x_{2} &{}= 1 &{}\mod{3} = 1 \\ x_{3} &{}= 1 &{}\mod{3} =1 \\ \end{align}

$\ldots$

Excluding $x_{0}$, this example generates a series of ones. In general, excluding $x_{0}$, the algorithm above will always generate a series of the same number less than M. Hence, it has a period of 1. The period can be described as the length of a sequence before it repeats. We want a large period with a sequence that is random looking. We can modify this algorithm to form the Multiplicative Congruential Algorithm.

$x_{k+1}=(a \cdot x_{k} + b) \mod m$(a little tip: $(a \cdot b)\mod c = (a\mod c)\cdot(b\mod c))$

Example
$\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10$
\begin{align} \text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\ \text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\ \text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\ \end{align}
$\ldots$

This example generates a sequence with a repeating cycle of two integers.

(If we choose the numbers properly, we could get a sequence of "random" numbers. How do we find the value of $a,b,$ and $m$? At the very least $m$ should be a very large, preferably prime number. The larger $m$ is, the higher the possibility to get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed on the interval (0,1)). Matlab uses $a=7^5, b=0, m=2^{31}-1$ – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that $m$ should be large and prime)

Note: $\frac {x_{n+1}}{m-1}$ is an approximation to the value of a U(0,1) random variable.

MatLab Instruction for Multiplicative Congruential Algorithm:
Before you start, you need to clear all existing defined variables and operations:

>>clear all
>>close all

>>a=17
>>b=3
>>m=31
>>x=5
>>mod(a*x+b,m)
ans=26
>>x=mod(a*x+b,m)


(Note:
1. Keep repeating this command over and over again and you will get random numbers – this is how the command rand works in a computer.
2. There is a function in MATLAB called RAND to generate a random number between 0 and 1.
For example, in MATLAB, we can use rand(1,1000) to generate 1000's numbers between 0 and 1. This is essentially a vector with 1 row, 1000 columns, with each entry a random number between 0 and 1.
3. If we would like to generate 1000 or more numbers, we could use a for loop

(Note on MATLAB commands:
1. clear all: clears all variables.
2. close all: closes all figures.
3. who: displays all defined variables.
4. clc: clears screen.
5. ; : prevents the results from printing.
6. disstool: displays a graphing tool.

>>a=13
>>b=0
>>m=31
>>x(1)=1
>>for ii=2:1000
x(ii)=mod(a*x(ii-1)+b,m);
end
>>size(x)
ans=1    1000
>>hist(x)


(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.)

This algorithm involves three integer parameters $a, b,$ and $m$ and an initial value, $x_0$ called the seed. A sequence of numbers is defined by $x_{k+1} = ax_k+ b \mod m$.

Note: For some bad $a$ and $b$, the histogram may not look uniformly distributed.

Note: In MATLAB, hist(x) will generate a graph representing the distribution. Use this function after you run the code to check the real sample distribution.

Example: $a=13, b=0, m=31$
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so it is important to choose $m$ large to decrease the probability of each number repeating itself too early. Values are between $0$ and $m-1$. If the values are normalized by dividing by $m-1$, then the results are approximately numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. We saw that the values between 0-30 had the same frequency in the histogram, so we can conclude that they are uniformly distributed.

If $x_0=1$, then

$x_{k+1} = 13x_{k}\mod{31}$

So,

\begin{align} x_{0} &{}= 1 \\ x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\ x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\ x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\ \end{align}

etc.

For example, with $a = 3, b = 2, m = 4, x_0 = 1$, we have:

$x_{k+1} = (3x_{k} + 2)\mod{4}$

So,

\begin{align} x_{0} &{}= 1 \\ x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ \end{align}

Another Example, a =3, b =2, m = 5, x_0=1 etc.

FAQ:

1.Why is it 1 to 30 instead of 0 to 30 in the example above?
$b = 0$ so in order to have $x_k$ equal to 0, $x_{k-1}$ must be 0 (since $a=13$ is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group $\Z_{31}$.
2.Will the number 31 ever appear?Is there a probability that a number never appears?
The number 31 will never appear. When you perform the operation $\mod m$, the largest possible answer that you could receive is $m-1$. Whether or not a particular number in the range from 0 to $m - 1$ appears in the above algorithm will be dependent on the values chosen for $a, b$ and $m$.

Examples:[From Textbook]
$\text{If }x_0=3 \text{ and } x_n=(5x_{n-1}+7)\mod 200$, $\text{find }x_1,\cdots,x_{10}$.
Solution:
\begin{align} x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\ x_2 &{}= 117 &{}\mod{200} &{}= 117 \\ x_3 &{}= 592 &{}\mod{200} &{}= 192 \\ x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\ x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\ x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\ x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\ x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\ x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\ x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\ \end{align}

Matlab code: a=5; b=7; m=200; x(1)=3; for ii=2:1000 x(ii)=mod(a*x(ii-1)+b,m); end size(x); hist(x)

Typically, it is good to choose $m$ such that $m$ is large, and $m$ is prime. Careful selection of parameters '$a$' and '$b$' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for $m$, our results were not satisfactory in producing an output resembling a uniform distribution.

The computed values are between 0 and $m-1$. If the values are normalized by dividing by $m-1$, their result is numbers uniformly distributed on the interval $\left[0,1\right]$ (similar to computing from uniform distribution).

From the example shown above, if we want to create a large group of random numbers, it is better to have large, prime $m$ so that the generated random values will not repeat after several iterations. Note: the period for this example is 8: from '$x_2$' to '$x_9$'.

There has been a research on how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.

Theorem (extra knowledge)
Let c be a non-zero constant. Then for any seed x0, and LCG will have largest max. period if and only if
(i) m and c are coprime;
(ii) (a-1) is divisible by all prime factor of m;
(iii) if and only if m is divisible by 4, then a-1 is also divisible by 4.

We want our LCG to have a large cycle. We call a cycle with m element the maximal period. We can make it bigger by making m big and prime. Recall:any number you can think of can be broken into a factor of prime Define coprime:Two numbers X and Y, are coprime if they do not share any prime factors.

Example:

Xn=(15Xn-1 + 4) mod 7


(i) m=7 c=4 -> coprime;
(ii) a-1=14 and a-1 is divisible by 7;
(iii) dose not apply.
(The extra knowledge stops here)

In this part, I learned how to use R code to figure out the relationship between two integers division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.

#### Summary of Multiplicative Congruential Algorithm

Problem: generate Pseudo Random Numbers.

Plan:

1. find integer: a b m(large prime) x0(the seed) .
2. $x_{k+1}=(ax_{k}+b)$mod m

Matlab Instruction:

>>clear all
>>close all
>>a=17
>>b=3
>>m=31
>>x=5
>>mod(a*x+b,m)
ans=26
>>x=mod(a*x+b,m)


Another algorithm for generating pseudo random numbers is the multiply with carry method. Its simplest form is similar to the linear congruential generator. They differs in that the parameter b changes in the MWC algorithm. It is as follows:

1.) xk+1 = axk + bk mod m
2.) bk+1 = floor((axk + bk)/m)
3.) set k to k + 1 and go to step 1 Source

### Inverse Transform Method

Now that we know how to generate random numbers, we use these values to sample form distributions such as exponential. However, to easily use this method, the probability distribution consumed must have a cumulative distribution function (cdf) $F$ with a tractable (that is, easily found) inverse $F^{-1}$.

Theorem:
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1). Let $F:\R \rightarrow \left[0,1\right]$ be a cdf. If $U \sim U\left[0,1\right]$, then the random variable given by $X:=F^{-1}\left(U\right)$ follows the distribution function $F\left(\cdot\right)$, where $F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}$ is the generalized inverse.
Note: $F$ need not be invertible everywhere on the real line, but if it is, then the generalized inverse is the same as the inverse in the usual case. We only need it to be invertible on the range of F(x), [0,1].

Proof of the theorem:
The generalized inverse satisfies the following:

$P(X\leq x)$

$= P(F^{-1}(U)\leq x)$ (since $X= F^{-1}(U)$ by the inverse method)
$= P((F(F^{-1}(U))\leq F(x))$ (since $F$ is monotonically increasing)
$= P(U\leq F(x))$ (since $P(U\leq a)= a$ for $U \sim U(0,1), a \in [0,1]$,
$= F(x) , \text{ where } 0 \leq F(x) \leq 1$

This is the c.d.f. of X.

That is $F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)$

Finally, $P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)$, since $U$ is uniform on the unit interval.

This completes the proof.

Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=$F^{-1}(U)$

Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. Of course, this holds true for all CDFs, since they are monotonic by definition.

In short, what the theorem tells us is that we can use a random number $U from U(0,1)$ to randomly sample a point on the CDF of X, then apply the inverse of the CDF to map the given probability to its domain, which gives us the random variable X.

Example 1 - Exponential: $f(x) = \lambda e^{-\lambda x}$
Calculate the CDF:
$F(x)= \int_0^x f(t) dt = \int_0^x \lambda e ^{-\lambda t}\ dt$ $= \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x}$ $= -e^{-\lambda x} + e^0 =1 - e^{- \lambda x}$
Solve the inverse:
$y=1-e^{- \lambda x} \Rightarrow 1-y=e^{- \lambda x} \Rightarrow x=-\frac {ln(1-y)}{\lambda}$
$y=-\frac {ln(1-x)}{\lambda} \Rightarrow F^{-1}(x)=-\frac {ln(1-x)}{\lambda}$
Note that 1 − U is also uniform on (0, 1) and thus −log(1 − U) has the same distribution as −logU.
Steps:
Step 1: Draw U ~U[0,1];
Step 2: $x=\frac{-ln(U)}{\lambda}$

EXAMPLE 2 Normal distribution G(y)=P[Y<=y)

     =P[-sqr (y) < z < sqr (y))
=integrate from -sqr(z) to Sqr(z) 1/sqr(2pi) e ^(-z^2/2) dz
= 2 integrate from 0 to sqr(y)  1/sqr(2pi) e ^(-z^2/2) dz


its the cdf of Y=z^2

pdf g(y)= G'(y) pdf pf x^2 (1)

MatLab Code:

>>u=rand(1,1000);
>>hist(u)       # this will generate a fairly uniform diagram


#let λ=2 in this example; however, you can make another value for λ
>>x=(-log(1-u))/2;
>>size(x)       #1000 in size
>>figure
>>hist(x)       #exponential


Example 2 - Continuous Distribution:

$f(x) = \dfrac {\lambda } {2}e^{-\lambda \left| x-\theta \right| } for -\infty \lt X \lt \infty , \lambda \gt 0$

Calculate the CDF:

$F(x)= \frac{1}{2} e^{-\lambda (\theta - x)} , for \ x \le \theta$
$F(x) = 1 - \frac{1}{2} e^{-\lambda (x - \theta)}, for \ x \gt \theta$

Solve for the inverse:

$F^{-1}(x)= \theta + ln(2y)/\lambda, for \ 0 \le y \le 0.5$
$F^{-1}(x)= \theta - ln(2(1-y))/\lambda, for \ 0.5 \lt y \le 1$

Algorithm:
Steps:
Step 1: Draw U ~ U[0, 1];
Step 2: Compute $X = F^-1(U)$ i.e. $X = \theta + \frac {1}{\lambda} ln(2U)$ for U < 0.5 else $X = \theta -\frac {1}{\lambda} ln(2(1-U))$

Example 3 - $F(x) = x^5$:
Given a CDF of X: $F(x) = x^5$, transform U~U[0,1].
Sol: Let $y=x^5$, solve for x: $x=y^\frac {1}{5}$. Therefore, $F^{-1} (x) = x^\frac {1}{5}$
Hence, to obtain a value of x from F(x), we first set 'u' as an uniform distribution, then obtain the inverse function of F(x), and set $x= u^\frac{1}{5}$

Algorithm:
Steps:
Step 1: Draw U ~ rand[0, 1];
Step 2: X=U^(1/5);

Example 4 - BETA(1,β):
Given u~U[0,1], generate x from BETA(1,β)
Solution: $F(x)= 1-(1-x)^\beta$, $u= 1-(1-x)^\beta$
Solve for x: $(1-x)^\beta = 1-u$, $1-x = (1-u)^\frac {1}{\beta}$, $x = 1-(1-u)^\frac {1}{\beta}$
let β=3, use Matlab to construct N=1000 observations from Beta(1,3)
MatLab Code:

>> u = rand(1,1000);
x = 1-(1-u)^(1/3);
>> hist(x,50)
>> mean(x)


Example 5 - Estimating $\pi$:
Let's use rand() and Monte Carlo Method to estimate $\pi$
N= total number of points
Nc = total number of points inside the circle
Prob[(x,y) lies in the circle=$\frac {Area(circle)}{Area(square)}$
If we take square of size 2, circle will have area =$\pi (\frac {2}{2})^2 =\pi$.
Thus $\pi= 4(\frac {N_c}{N})$

  For example, UNIF(a,b)
$y = F(x) = (x - a)/ (b - a)$
$x = (b - a ) * y + a$
$X = a + ( b - a) * U$
where U is UNIF(0,1)


Limitations:
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.

We learned how to prove the transformation from cdf to inverse cdf,and use the uniform distribution to obtain a value of x from F(x). We can also use uniform distribution in inverse method to determine other distributions. The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same. Then, we can look at the graph to determine what kind of distribution the graph resembles.

#### Probability Distribution Function Tool in MATLAB

disttool         #shows different distributions


This command allows users to explore different types of distribution and see how the changes affect the parameters on the plot of either a CDF or PDF.

change the value of mu and sigma can change the graph skew side.

## Class 3 - Tuesday, May 14

### Recall the Inverse Transform Method

Let U~Unif(0,1),then the random variable X = F-1(u) has distribution F.
To sample X with CDF F(x),

$1) U~ \sim~ Unif [0,1]$ 2) X = F-1(u)

Note: CDF of a U(a,b) random variable is:

$F(x)= \begin{cases} 0 & \text{for }x \lt a \\[8pt] \frac{x-a}{b-a} & \text{for }a \le x \lt b \\[8pt] 1 & \text{for }x \ge b \end{cases}$

Thus, for $U$ ~ $U(0,1)$, we have $P(U\leq 1) = 1$ and $P(U\leq 1/2) = 1/2$.
More generally, we see that $P(U\leq a) = a$.
For this reason, we had $P(U\leq F(x)) = F(x)$.

Reminder:
This is only for uniform distribution $U~ \sim~ Unif [0,1]$
$P (U \le 1) = 1$
$P (U \le 0.5) = 0.5$
$P (U \le a) = a$

$P(U\leq a)=a$

Note that on a single point there is no mass probability (i.e. $u$ <= 0.5, is the same as $u$ < 0.5) More formally, this is saying that $P(X = x) = F(x)- \lim_{s \to x^-}F(x)$ , which equals zero for any continuous random variable

#### Limitations of the Inverse Transform Method

Though this method is very easy to use and apply, it does have a major disadvantage/limitation:

• We need to find the inverse cdf $F^{-1}(\cdot)$. In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x).

For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution.

In conclusion, we need to find another way of sampling from more complicated distributions

### Discrete Case

The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function:

\begin{align}P(X = x_i) &{}= p_i \end{align}
$x_0 \leq x_1 \leq x_2 \dots \leq x_n$
$\sum p_i = 1$

Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):
1. Define a probability mass function for $x_{i}$ where i = 1,....,k. Note: k could grow infinitely.
2. Generate a uniform random number U, $U~ \sim~ Unif [0,1]$
3. If $U\leq p_{o}$, deliver $X = x_{o}$
4. Else, if $U\leq p_{o} + p_{1}$, deliver $X = x_{1}$
5. Repeat the process again till we reached to $U\leq p_{o} + p_{1} + ......+ p_{k}$, deliver $X = x_{k}$

Note that after generating a random U, the value of X can be determined by finding the interval $[F(x_{j-1}),F(x_{j})]$ in which U lies.

In summary: Generate a discrete r.v.x that has pmf:

  P(X=xi)=Pi,    x0<x1<x2<...


1. Draw U~U(0,1);
2. If F(x(i-1))<U<F(xi), x=xi.

Example 3.0:
Generate a random variable from the following probability function:

 x -2 -1 0 1 2 f(x) 0.1 0.5 0.07 0.03 0.3

1. Gen U~U(0,1)
2. If U < 0.5 then output -1
else if U < 0.8 then output 2
else if U < 0.9 then output -2
else if U < 0.97 then output 0 else output 1

Example 3.1 (from class): (Coin Flipping Example)
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1.

We can define the U function so that:

If $U\leq 0.5$, then X = 0

and if $0.5 \lt U\leq 1$, then X =1.

This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.

$U~ \sim~ Unif [0,1]$

\begin{align} P(X = 0) &{}= 0.5\\ P(X = 1) &{}= 0.5\\ \end{align}

$x = \begin{cases} 0, & \text{if } U\leq 0.5 \\ 1, & \text{if } 0.5 \lt U \leq 1 \end{cases}$

• Code
>>for ii=1:1000
u=rand;
if u<0.5
x(ii)=0;
else
x(ii)=1;
end
end
>>hist(x)


Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.

Example 3.2 (From class):

Suppose we have the following discrete distribution:

\begin{align} P(X = 0) &{}= 0.3 \\ P(X = 1) &{}= 0.2 \\ P(X = 2) &{}= 0.5 \end{align}

The cumulative distribution function (cdf) for this distribution is then:

$F(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ 0.3, & \text{if } x \lt 1 \\ 0.5, & \text{if } x \lt 2 \\ 1, & \text{if } x \ge 2 \end{cases}$

Then we can generate numbers from this distribution like this, given $U \sim~ Unif[0, 1]$:

$x = \begin{cases} 0, & \text{if } U\leq 0.3 \\ 1, & \text{if } 0.3 \lt U \leq 0.5 \\ 2, & \text{if } 0.5 \lt U\leq 1 \end{cases}$

"Procedure"
1. Draw U~u (0,1)
2. if U<=0.3 deliver x=0
3. else if 0.3<U<=0.5 deliver x=1
4. else 0.5<U<=1 deliver x=2

Can you find a faster way to run this algorithm? Consider:

$x = \begin{cases} 2, & \text{if } U\leq 0.5 \\ 1, & \text{if } 0.5 \lt U \leq 0.7 \\ 0, & \text{if } 0.7 \lt U\leq 1 \end{cases}$

The logic for this is that U is most likely to fall into the largest range. Thus by putting the largest range (in this case x >= 0.5) we can improve the run time of this algorithm. Could this algorithm be improved further using the same logic?

• Code (as shown in class)

Use Editor window to edit the code

>>close all
>>clear all
>>for ii=1:1000
u=rand;
if u<=0.3
x(ii)=0;
elseif u<=0.5
x(ii)=1;
else
x(ii)=2;
end
end
>>size(x)
>>hist(x)


The algorithm above generates a vector (1,1000) containing 0's ,1's and 2's in differing proportions. Due to the criteria for accepting 0, 1 or 2 into the vector we get proportions of 0,1 &2 that correspond to their respective probabilities. So plotting the histogram (frequency of 0,1&2) doesn't give us the pmf but a frequency histogram that shows the proportions of each, which looks identical to the pmf.

Example 3.3: Generating a random variable from pdf

$f_{x}(x) = \begin{cases} 2x, & \text{if } 0\leq x \leq 1 \\ 0, & \text{if } otherwise \end{cases}$
$F_{x}(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ \int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\ 1, & \text{if } x \gt 1 \end{cases}$
\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}

Example 3.4: Generating a Bernoulli random variable

\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}
$F(x) = \begin{cases} 1-p, & \text{if } x \lt 1 \\ 1, & \text{if } x \ge 1 \end{cases}$

1. Draw $U~ \sim~ Unif [0,1]$
2. $X = \begin{cases} 0, & \text{if } 0 \lt U \lt 1-p \\ 1, & \text{if } 1-p \le U \lt 1 \end{cases}$

Example 3.5: Generating Binomial(n,p) Random Variable
$use p\left( x=i+1\right) =\dfrac {n-i} {i+1}\dfrac {p} {1-p}p\left( x=i\right)$

Step 1: Generate a random number $U$.
Step 2: $c = \frac {p}{(1-p)}$, $i = 0$, $pr = (1-p)^n$, $F = pr$
Step 3: If U<F, set X = i and stop,
Step 4: $pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1$
Step 5: Go to step 3

• Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.
• Note: Another method by seeing the Binomial as a sum of n independent Bernoulli random variables, U1, ..., Un. Then set X equal to the number of Ui that are less than or equal to p. To use this method, n random numbers are needed and n comparisons need to be done. On the other hand, the inverse transformation method is simpler because only one random variable needs to be generated and it makes 1 + np comparisons.

Step 1: Generate n uniform numbers U1 ... Un.
Step 2: X = $\sum U_i \lt = p$ where P is the probability of success.

Example 3.6: Generating a Poisson random variable

"Let X ~ Poi(u). Write an algorithm to generate X. The PDF of a poisson is:

\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}

We know that

\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}

The ratio is \begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align} Therefore, \begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}

Algorithm:
1) Generate U ~ U(0,1)
2) \begin{align} X = 0 \end{align}

  \begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}


3) If U<F, output x

  Else, \begin{align} p = (u/(x+1))^p \end{align}
\begin{align} F = F + p \end{align}
\begin{align} x = x + 1 \end{align}


4) Go to 1"

Acknowledgements: This is an example from Stat 340 Winter 2013

Example 3.7: Generating Geometric Distribution:

Consider Geo(p) where p is the probability of success, and define random variable X such that X is the total number of trials required to achieve the first success. x=1,2,3..... We have pmf: $P(X=x_i) = \, p (1-p)^{x_{i}-1}$ We have CDF: $F(x)=P(X \leq x)=1-P(X\gt x) = 1-(1-p)^x$, P(X>x) means we get at least x failures before we observe the first success. Now consider the inverse transform:

$x = \begin{cases} 1, & \text{if } U\leq p \\ 2, & \text{if } p \lt U \leq 1-(1-p)^2 \\ 3, & \text{if } 1-(1-p)^2 \lt U\leq 1-(1-p)^3 \\ .... k, & \text{if } 1-(1-p)^{k-1} \lt U\leq 1-(1-p)^k .... \end{cases}$

Note: Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach)

General Procedure
1. Draw U ~ U(0,1)
2. If $U \leq P_{0}$ deliver $x = x_{0}$
3. Else if $U \leq P_{0} + P_{1}$ deliver $x = x_{1}$
4. Else if $U \leq P_{0} + P_{1} + P_{2}$ deliver $x = x_{2}$
...

  Else if $U \leq P_{0} + ... + P_{k}$ deliver $x = x_{k}$


===Inverse Transform Algorithm for Generating a Binomial(n,p) Random Variable(from textbook)===
step 1: Generate a random number U
step 2: c=p/(1-p),i=0, pr=(1-p)n, F=pr.
step 3: If U<F, set X=i and stop.
step 4: pr =[c(n-i)/(i+1)]pr, F=F+pr, i=i+1.
step 5: Go to step 3.

Problems
Though this method is very easy to use and apply, it does have a major disadvantage/limitation: We need to find the inverse cdf F^{-1}(\cdot) . In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x). For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution. In conclusion, we need to find another way of sampling from more complicated distributions Flipping a coin is a discrete case of uniform distribution, and the code below shows an example of flipping a coin 1000 times; the result is close to the expected value 0.5.
Example 2, as another discrete distribution, shows that we can sample from parts like 0,1 and 2, and the probability of each part or each trial is the same.
Example 3 uses inverse method to figure out the probability range of each random varible.

## Summary of Inverse Transform Method

Problem:generate types of distribution.

Plan:

Continuous case:

1. find CDF F
2. find the inverse F-1
3. Generate a list of uniformly distributed number {x}
4. {F-1(x)} is what we want

Matlab Instruction

>>u=rand(1,1000);
>>hist(u)
>>x=(-log(1-u))/2;
>>size(x)
>>figure
>>hist(x)


Discrete case:

1. generate a list of uniformly distributed number {u}
2. di=xi if$X=x_i,$ if $F(x_{i-1})\lt U\leq F(x_i)$
3. {di=xi} is what we want

Matlab Instruction

>>for ii=1:1000
u=rand;
if u<0.5
x(ii)=0;
else
x(ii)=1;
end
end
>>hist(x)
`

### Generalized Inverse-Transform Method

Valid for any CDF F(x): return X=min{x:F(x)$\leq$ U}, where U~U(0,1)

1. Continues, possibly with flat spots (i.e. not strictly increasing)

2. Discrete

3. Mixed continues discrete

Inverse transform method preserves monotonicity and correlation

which helps in

1. Variance reduction methods ...

2. Generating truncated distributions ...

3. Order statistics ...

### Acceptance-Rejection Method

Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;

1. Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)
2. For some distributions, such as Gaussian, it is too difficult to find the inverse

To generate random samples for these functions, we will use different methods, such as the Acceptance-Rejection Method. This method is more efficient than the inverse transform method. The basic idea is to find an a