http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Ysyap&feedformat=atomstatwiki - User contributions [US]2024-03-29T07:38:25ZUser contributionsMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17725stat340s132013-06-04T04:45:30Z<p>Ysyap: /* Inverse Transform Method */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
For this exponential distribution, we will let lambda be 2.<br />
Code:<br />
% Set up the parameters.<br />
lam = 2;<br />
n = 1000;<br />
% Generate the random variables.<br />
uni = rand(1,n);<br />
X = -log(uni)/lam;<br />
% Get the values to draw the theoretical curve.<br />
x = 0:.1:5;<br />
% This is a fuction in the Statistics Toolbox.<br />
y = exppdf(x,1/2);<br />
% Get the information for the histogram.<br />
[N, h] = hist(X,10);<br />
% Change bar heights to make it correspond to the theoretical density.<br />
N = N/(h(2)-h(1))/n;<br />
% Do the plots.<br />
bar(h,N,1,'w')<br />
hold on<br />
plot(x,y)<br />
hold off<br />
xlabel('X')<br />
ylabel('f(x) - Exponential')<br />
</pre><br />
[[File:Exponential.jpg]]<br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
'''Note:''' <br><br />
1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br><br />
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). Besides that, it is best to keep the number of rejected varieties small for maximum efficiency. <br> <br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:Exponential.jpg&diff=17724File:Exponential.jpg2013-06-04T04:43:30Z<p>Ysyap: </p>
<hr />
<div></div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17723stat340s132013-06-04T04:20:27Z<p>Ysyap: /* Acceptance-Rejection Method */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
'''Note:''' <br><br />
1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br><br />
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). Besides that, it is best to keep the number of rejected varieties small for maximum efficiency. <br> <br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17722stat340s132013-06-04T04:19:04Z<p>Ysyap: /* Acceptance-Rejection Method */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
'''Note:''' <br><br />
1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br><br />
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). <br> <br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17721stat340s132013-06-04T04:18:25Z<p>Ysyap: /* Acceptance-Rejection Method */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
'''Note:''' <br><br />
1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). <br> <br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17720stat340s132013-06-04T04:13:12Z<p>Ysyap: /* Acceptance-Rejection Method */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
'''<br />
Note:''' <br><br />
1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). <br> <br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17719stat340s132013-06-04T03:58:26Z<p>Ysyap: /* Discrete Case */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a*b)mod c = (a mod c)*(b mod c))<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):<br><br />
1. Define a probability mass function for <math>x_{i}</math> where i = 1,....,k. Note: k could grow infinitely. <br><br />
2. Generate a uniform random number U, <math> U~ \sim~ Unif [0,1] </math><br><br />
3. If <math>U\leq p_{o}</math>, deliver <math>X = x_{o}</math><br><br />
4. Else, if <math>U\leq p_{o} + p_{1} </math>, deliver <math>X = x_{1}</math><br><br />
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br><br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
*recall the acceptance rate is 1/c.(not rejection rate) <br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y) where y is the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
g(x)= 1/3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br>(poisson distribution)<br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="margin-bottom:10px;border:10px solid red;background: yellow">one good example to understand pros and cons about the AR method. AR method is useless when dealing with sampling distribution with a peak which is high, because c will be huge<br><br />
which brings the acceptance rate low which leads to very time take sampling </div><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
<br />
Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B).<br />
<br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>size(sum(x)) Note: see the size of x if we forget it<br />
(the answer is 20 1000)<br />
>>hist(x(1:)) Note: the graph of the first exponential distribution <br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X, sum(X) also does the same thing. The output is a row vector of the sums of each column. <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X, returning a vector. <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log).<br><br />
:*: Matlab processes loops very slowly, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible.<br><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1. On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1) i.e. Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<br />
<table><br />
<tr><br />
<td><div onmouseover="document.getElementById('woyun').style.visibility='visible'"<br />
onmouseout="document.getElementById('woyun').style.visibility='hidden'"><br />
<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
</div><br />
</td><br />
<td><br />
<div id="woyun" style="<br />
<br />
visibility:hidden;<br />
width:100px;<br />
height:100px;<br />
background:#FFFFAD;<br />
position:relative;<br />
animation:movement infinite;<br />
animation-duration:2s;<br />
animation-direction:alternate;<br />
<br />
<br />
/* Safari and Chrome */<br />
-webkit-animation:movement infinite;<br />
-webkit-animation-duration:2s;<br />
-webkit-animation-direction:alternate; <br />
<br />
<br />
@keyframes movement<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}<br />
<br />
@-webkit-keyframes movement /* Safari and Chrome */<br />
{<br />
from {left:0px;}<br />
to {left:200px;}<br />
}"<br />
>which is almost useless in this course</div><br />
</td><br />
</tr><br />
</table><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup> and <math> \tan(\theta) = \frac{y}{x} </math><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br /><br />
More intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). If support is from negative infinity to infinity, then the integral will return 0.<br /><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Alternatively,<br> <math> x =\cos(2\pi U_2)\sqrt{-2\ln U_1}\, </math> and<br> <math> y =\sin(2\pi U_2)\sqrt{-2\ln U_1}\, </math><br /><br />
<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a uniform distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution.<br />
<br />
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br><br />
<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2]; <-produce a vector<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \Sigma^{\frac{1}{2}} \,Z </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
Let x1,s2 denote the lifetime of 2 independent particles, x1~exp(lambda), x2~exp(lambda)<br />
we are interested in y=min(x1,x2)<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
<br />
The inverse method is universal in the sense that we can potentially sample from any distribution where we can find the inverse of the cumulative distribution function.<br />
<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
'''Example 1'''<br><br />
<br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br><br />
<br />
x~exp(<math>\lambda</math>)<br><br />
<br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate U~ U(0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we generalize this example from two independent particles to n independent particles we will have:<br><br />
<br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> ...<br> <math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br>.<br />
<br />
And the algorithm using the inverse-transform method as follows:<br />
<br />
step1: Generate U~U(0,1)<br />
<br />
Step2: <math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
<br>where a>0 and a is a real number<br />
What is the distribution of X?<br><br />
<br />
'''Solution:'''<br><br />
<br />
We can find a form for the cumulative distribution function of X by isolating U as U~Unif[0,1) will take values from the range of F(X)uniformly. It then remains to differentiate the resulting form by X to obtain the probability density function.<br />
<br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
[[File:Example_2_diagram.jpg]]<br />
<br />
'''Example 3'''<br><br />
<br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. Generate values from X.<br><br />
<br />
'''Solution:'''<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Observe from above that the values of X for n = 20 are close to 1, this is because we can view X<sup>n</sup> as the maximum of n independent random variables X, X~Unif(0,1) and is much likely to be close to 1 as n increases. This observation is the motivation for method 2 below.<br><br />
<br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Now, we are having an similar example as example 1 just doing the maximum way.<br />
<br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br><br><br />
<br />
'''Discrete Case:'''<br />
<font size="3"><br />
u~unif(0,1)<br><br />
x <- 0, S <- P<sub>0</sub><br><br />
while u < S<br><br />
x <- x + 1<br><br />
S <- S + P<sub>0</sub><br><br />
Return x<br></font><br />
<br />
===Decomposition Method===<br />
The CDF, F, is a composition if <math>F_{X}(x)</math> can be written as:<br />
<br />
<math>F_{X}(x) = \sum_{i=1}^{n}p_{i}F_{X_{i}}(x)</math> where<br />
<br />
1) p<sub>i</sub> > 0<br />
<br />
2) <math>\sum_{i=1}^{n}</math>p<sub>i</sub> = 1.<br />
<br />
3) <math>F_{X_{i}}(x)</math> is a CDF<br />
<br />
The general algorithm to generate random variables from a composition CDF is:<br />
<br />
1) Generate U, V ~ <math>U(0,1)</math><br />
<br />
2) If u < p<sub>1</sub>, v=<math>F_{X_{1}}(x)</math><sup>-1</sup><br />
<br />
3) Else if u < p<sub>1</sub>+p<sub>2</sub>, v=<math>F_{X_{2}}(x)</math><sup>-1</sup><br />
<br />
4) ....<br />
<br />
<b>Explanation</b><br><br />
Each random variable that is a part of X contributes <math>p_{i}*F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time.<br />
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
=== Example of Decomposition Method ===<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
'''Algorithm:'''<br />
<br />
Generate U ~ Unif [0,1)<br />
<br />
Generate V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x = v<br />
<br />
else if u<2/3, x = v<sup>1/2</sup><br />
<br />
else x = v<sup>1/3</sup><br><br />
<br />
<br />
'''Matlab Code:''' <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
=== Practice Example from Lecture 7 ===<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1~exp(<math>\lambda_{1}</math>), X2~exp(<math>\lambda_{2}</math>)<br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
===Question 2===<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
==Class 8 - Thursday, May 30, 2013==<br />
<br />
In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \><br />
<br />
'''Bernoulli distribution'''<br />
<br />
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ B(1, p) has the same meaning as X ~ Bern(p). B(n, p), is the distribution of the sum of n independent Bernoulli trials, Bern(p), each with the same probability p. <br />
<br />
Algorithm: <br />
<br />
1. Generate u~Unif(0,1) <br><br />
2. If u <= p, then x = 1 <br><br />
Else x = 0 <br />
<br />
===The Binomial Distribution===<br />
<br />
If X~Bin(n,p), then its pmf is of form:<br />
f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /><br />
<br />
mean (x) = E(x) = np; variance = np(1-p)<br />
<br />
Generate n uniform random number <math>U_1,...,U_R</math> and let X be the number of <math>U_i</math> that are less than or equal to p.<br />
The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /><br />
MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trails and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. So we can use it to get the number which is less than or equal p.<br /><br />
<br />
Procedure for Bernoulli <br />
U~Unif(0,1)<br />
if U <= p<br />
x = 1<br />
else <br />
x = 0<br />
<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>a=[3 5 8];<br />
>>a<5<br />
ans= 1 0 0<br />
<br />
>>rand(20,1000)<br />
>>rand(20,1000)<0.4<br />
>>A = sum(rand(20,1000)<0.4)<br />
>>hist(A)<br />
>>mean(A)<br />
Note: `1` in the above code means sum the matrix by column<br />
<br />
>>sum(sum(rand(20,1000)<0.4)>8)/1000<br />
This is an estimate of Pr[X>8].<br />
<br />
</pre><br />
<br />
[[File:Binomial_example.jpg|300px]]<br />
<br />
remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0].<br />
using code to find some value what i want to get from the matrix. It`s useful to define some matrixs.<br />
<br />
===The Geometric Distribution===<br />
<br />
x=1, f(x)=p <br />
x=2, f(x)=p(1-p)<br />
x=3, f(x)=p(1-p)^2<br />
<br />
General speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /><br />
The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /><br />
<br />
<br />
<br />
Other properties<br />
<br />
<br />
Probability mass function : P(X=k) = P(1-P)^(k-1)<br />
<br />
Tail probability : P(X>n) = (1-p)^n<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
<br />
Mean of x = 1/p<br />
Var(x) = (1-p)/p^2<br />
<br />
There are two ways to look at a geometric distribution.<br />
<br />
<b>1st Method</b><br />
<br />
We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. <br />
<br />
pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ...<br />
<br />
<b>2nd Method</b><br />
<br />
This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. <br />
<br />
pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, ....<br />
<br />
</span><br />
<br />
<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p. Then X ~ geo (p) <br /><br />
<br />
P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/><br />
<br />
'''Proof''' <br/><br />
<br />
P(X>x) = P( floor(Y) + 1 > X) = P(floor (Y) > x- 1) = P(Y>= x) = e<sup>(-<math>\lambda</math> * x)</sup> <br><br />
<br />
SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>, then <br><br />
<br />
P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/><br />
<br />
Note that floor(Y)>X -> Y >= X+1 <br/><br />
<br />
proof how to use EXP distribution to find P(X>x)=(1-p)^x<br />
<br />
<br><br />
Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br><br />
the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br><br />
<br />
Proof: <br><br />
<math>\text{For } n \in \mathcal{N} </math><br//><br />
<br />
<math>\begin{align}<br />
P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\<br />
&{}= F( n+1) - F(n) \\<br />
\text{By algebra and simplification:} \\<br />
P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\<br />
&{}= Geo (1 - e^ {-\lambda}) \\<br />
<br />
\text{Proof of ceiling part follows immediately.} \\<br />
\end{align}</math> <br//><br />
<br />
<br />
<br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /><br />
<math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /><br />
<br /><br />
<br />
<math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /><br />
<br />
<math>P(Y>=X)</math><br /><br />
Y ~ Exp (<math>\lambda</math>)<br /><br />
pdf of Y : <math>-\lambda e^{-\lambda}</math><br /><br />
cdf of Y : <math>1-\lambda e^{-\lambda}</math><br /><br />
cdf <math>P(Y<x)=1-\lambda e^{-\lambda}</math><br /><br />
<math>P(Y>=x)=1-(1-\lambda e^{-\lambda})=e^{-\lambda x}</math><br /><br />
<math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /><br />
<math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /><br />
<math>E[x]=1/P </math><br /><br />
<math>Var= (1-P)/(P^2)</math><br /><br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)<br />
<br />
use e^(-mu)=(1-p) to figure out the mean and variance.<br />
'''Code'''<br><br />
<pre style="font-size:16px"><br />
>>p=0.4;<br />
>>l=-log(1-p);<br />
>>u=rand(1,1000);<br />
>>y=(-1/l)*log(u);<br />
>>x=floor(y)+1;<br />
>>hist(x)<br />
<br />
'''Note:'''<br />
mean(x)~E[X]=> 1/p<br />
Var(x)~V[X]=> (1-p)/p^2<br />
<br />
</pre><br />
<br />
[[File:Geometric_example.jpg|300px]]<br />
<br />
===Poisson Distribution===<br />
If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /><br />
<br />
Understanding of Poisson distribution:<br />
<br />
If customers come to bank over time, its rate is <math>\lambda</math> per unit of time <br />
X(t) = # of customer in [0,t] ~ Pois<math>(\lambda*t)</math><br />
<br />
Its mean and variance are<br /><br />
<math>\displaystyle E[X]=\lambda</math><br /><br />
<math>\displaystyle Var[X]=\lambda</math><br /><br />
<br />
A poison random variable X can be interpreted as the maximal number of i.i.d. exponential variables(with parameter) whose sum does not exceed 1.<br /><br />
The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1.<br />
<br /><br />
<br /><br />
<math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br><br />
<math>Y_j = -\frac{1}{\lambda}log(U_j) \text{ from Inverse Transform Method}</math><br><br><br />
<br />
<math>\begin{align} <br />
X &= max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}log(U_j) \leq 1 \} \\<br />
&= max \{ n: \sum_{j=1}^{n} log(U_j) > -\lambda \} \\<br />
&= max \{ n: log(\prod_{j=1}^{n} U_j) > -\lambda \} \\<br />
&= max \{ n: \prod_{j=1}^{n} U_j > e^{-\lambda} \} \\<br />
\end{align}</math><br><br /><br />
<br />
Note: From above, we can use Logarithm Rules <math>log(a)+log(b)=log(ab)</math> to generate the result.<br><br /><br />
'''Algorithm:''' <br /><br />
1) Set n=1, a=1 <br /><br />
2) Generate <math>U_n ~ U(0,1), a=aU_n </math> <br /><br />
3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /><br />
<br />
using inverse-method to proof mean and variance of poisson distribution.<br />
<br />
===MATLAB Code for generating Poisson Distribution===<br />
<pre><br />
>>l=2; <br />
>>for ii=1:1000<br />
n=1;<br />
a=1;<br />
u=rand;<br />
a=a*u;<br />
while a>exp(-l)<br />
n=n+1;<br />
u=rand;<br />
a=a*u;<br />
end<br />
x(ii)=n-1;<br />
end<br />
>>hist(x)<br />
>>Sum(x==1)/10000 # Probability of x=1<br />
>>Sum(x>3)/10000 # Probability of x > 3<br />
</pre><br />
<br />
[[File:Poisson_example.jpg|300px]]<br />
<br />
<br />
<span style="background:#F5F5DC"><br />
EXAMPLE for geometric distribution: Consider the case of rolling a die: </span><br />
<br />
X=the number of rolls that it takes for the number 5 to appear. <br />
<br />
We have X ~Geo(1/5), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... <br />
<br />
Now, let <math>Y=e^{\lambda}</math> => x=floor(Y) +1 <br />
<br />
Let <math>e^{-\lambda}=5/6</math> <br />
<br />
<math>P(X>x) = P(Y>=x)</math> (from the class notes) <br />
<br />
We have <math>e^{-\lambda *x} = (5/6)^x</math> <br />
<br />
Algorithm: let <math>\lambda = -\log(5/6)</math> <br />
<br />
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed <br />
<br />
2) Set X= floor(Y)+1, to generate X <br />
<br />
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math><br />
<br />
<br />
<span style="background:#F5F5DC">GENERATING NEGATIVE BINOMIAL RV USING GEOMETRIC RV'S</span><br />
<br />
Property of negative binomial Random Variable: <br/><br />
<br />
The negative binomial random variable is a sum of r independent geometric random variables.<br/><br />
<br />
Using this property we can formulate the following algorithm:<br/><br />
<br />
Step 1: Generate r geometric rv's each with probability p using the procedure presented above.<br/><br />
Step 2: Take the sum of these r geometric rv's. This RV follows NB(r,p)<br/><br />
<br />
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x.<br />
<br />
=== Another way to generate random variable from poisson distribution ===<br />
<br/><br />
Note: <math>P(X=x)=e^{-\lambda}\lambda^x/x!</math><br/><br />
<math>P(X=x+1)= e^{-\lambda}\lambda^(x+1)/(x+1)!</math> <br/><br />
The ratio is: <math>p(x+1)/p(x)=\lambda/(x+1)</math> <br/><br />
Therefore, <math>p(x+1)=\lambda/(x+1)*p(x)</math> <br/><br />
Algorithm: <br/><br />
1. Set x=0<br/><br />
2. <math>F=P(X=0)=exp(-\lambda)</math> <br/><br />
3. Generate U~Unif(0,1) <br/><br />
If U<F, output x<br/><br />
Else if <br/><br />
<math>p=\lambda/(x+1)*p</math><br/><br />
F=F+p<br/><br />
x= x+1<br/><br />
Go to 3<br />
u=rand(0.1000)<br />
hist(x)<br />
<br />
<br />
1. set n =1, a = 1<br />
<br />
2. set U<sub>n</sub>~U(0,1), a = a*U<sub>n</sub><br />
<br />
3. if <math>a > e^{-\lambda}</math>, then n = n+1, go to step 2,<br />
<br />
else x = n-1<br />
<br />
firstly, find the ratio of x=k+1 to x=k, find out F[x=0],and generate to uniform.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17559stat340s132013-05-30T13:42:25Z<p>Ysyap: /* Thursday, May 30, 2013 */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
<br />
==Thursday, May 30, 2013==<br />
<br />
<b>The Geometric Distribution</b> <br><br />
<br />
If X~G(p) then its pdf is of the form f(x)=)(1-p)^(x-1)*(p), x=1,2,...<br /><br />
The random variable x is the number of trials required until the first success in a series of independent Bernoulli trials.<br /><br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p.<br /><br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let <math>\lambda = -\log (1-p) </math><br /><br />
2) Generate a <math>Y \sim Exp(\lambda )</math> <br /><br />
3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /><br />
<br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17558stat340s132013-05-30T13:40:06Z<p>Ysyap: /* Thursday, May 30, 2013 */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
<br />
==Thursday, May 30, 2013==<br />
<br />
<b>The Geometric Distribution</b> <br><br />
<br />
If X~G(p) then its pdf is of the form f(x)=)(1-p)^(x-1)*(p), x=1,2,...<br /><br />
The random variable x is the number of trials required until the first success in a series of independent Bernoulli trials.<br /><br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p.<br /><br />
<br />
<br />
'''Algorithm:''' <br /><br />
<math>1) Let \lambda = -\log (1-p) <br /><br />
2) Generate a Y \sim Exp(\lambda ) <br /><br />
3) We can then let X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p) <br /></math><br />
<br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17556stat340s132013-05-30T13:38:02Z<p>Ysyap: /* Thursday, May 30, 2013 */</p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
<br />
==Thursday, May 30, 2013==<br />
<br />
<b>The Geometric Distribution</b> <br><br />
<br />
If X~G(p) then its pdf is of the form f(x)=)(1-p)^(x-1)*(p), x=1,2,...<br /><br />
The random variable x is the number of trials required until the first success in a series of independent Bernoulli trials.<br /><br />
If Y~Exp(l) then X=floor(Y)+1 is geometric.<br /><br />
Choose e^(-l)=1-p.<br /><br />
<br />
<br />
'''Algorithm:''' <br /><br />
1) Let \lambda = -\log (1-p) <br /><br />
2) Generate a Y \sim Exp(\lambda ) <br /><br />
3) We can then let X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p) <br /><br />
<br />
P(X>x)<br /><br />
=P(floor(y)+1>x)<br /><br />
=P(floor(y)>x-1)<br /><br />
=P(y>=x)</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17553stat340s132013-05-30T13:26:30Z<p>Ysyap: </p>
<hr />
<div>== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
<br />
==Example Thursday, May 30, 2013==<br />
<br />
If X~G(p) then its pdf is of the form f(x)=)(1-p)^(x-1)*(p), x=1,2,...<br />
The random variable x is the number of trials required until the first success in a series of independent Bernoulli trials.<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric. Choose e^(-l)=1-p.<br />
P(X>x)= P(floor(y)+1>x)=P(floor(y)>x-1)=P(y>=x)</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17552stat340s132013-05-30T13:25:44Z<p>Ysyap: /* Class 8 - Thursday, May 30 */</p>
<hr />
<div>== Class 8 - Thursday, May 30==<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /><br />
<br />
<br />
==Example Thursday, May 30, 2013==<br />
<br />
If X~G(p) then its pdf is of the form f(x)=)(1-p)^(x-1)*(p), x=1,2,...<br />
The random variable x is the number of trials required until the first success in a series of independent Bernoulli trials.<br />
If Y~Exp(l) then X=floor(Y)+1 is geometric. Choose e^(-l)=1-p.<br />
P(X>x)= P(floor(y)+1>x)=P(floor(y)>x-1)=P(y>=x)</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=17550stat340s132013-05-30T13:24:29Z<p>Ysyap: /* Class 8 - Thursday, May 30 */</p>
<hr />
<div>==Class 8 - Thursday, May 30==<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to theri personal accounts!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 3 * 1.1 + 0.9 mod 2<br /><br />
0.9 = 4.2 mod 1.1<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
2 = 30 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
1 = 25 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h2><br />
<p><b>Problem:</b> generate Pseudo Random Numbers.</p><br />
<b>Plan:</b> <br />
<ol><br />
<li>find integer: <i>a b m</i>(large prime) </i>x<sub>0</sub></i>(the seed) .</li><br />
<li><math>x_{x+1}=(ax_{k}+b)</math>mod m</li><br />
</ol><br />
<b>Matlab Instruction:</b><br />
<pre style="font-size:16px">&gt;&gt;clear all<br />
&gt;&gt;close all<br />
&gt;&gt;a=17<br />
&gt;&gt;b=3<br />
&gt;&gt;m=31<br />
&gt;&gt;x=5<br />
&gt;&gt;mod(a*x+b,m)<br />
ans=26<br />
&gt;&gt;x=mod(a*x+b,m)<br />
</pre><br />
</div><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x.<br />
<br />
'''Inverse Transform Algorithm for Generating Binomial(n,p) Random Variable'''<br><br />
Step 1: Generate a random number <math>U</math>.<br><br />
Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br><br />
Step 3: If U<F, set X = i and stop,<br><br />
Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br><br />
Step 5: Go to step 3<br>*<br />
*Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.<br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(t) dt</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda t}\ dt</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{-\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-\frac {ln(1-y)}{\lambda}</math><br /><br />
<math>y=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<math>F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^\frac{1}{5}</math><br /><br /><br />
<br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br />
<math>F(x)= 1-(1-x)^\beta</math>, <br />
<math>u= 1-(1-x)^\beta</math><br /><br />
Solve for x: <br />
<math>(1-x)^\beta = 1-u</math>, <br />
<math>1-x = (1-u)^\frac {1}{\beta}</math>, <br />
<math>x = 1-(1-u)^\frac {1}{\beta}</math><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
We learned how to prove the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to determine other distribution.<br />
The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can look at the graph to determine what kind of distribution the graph belongs to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for <math> U </math> ~ <math>U(0,1) </math>, we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U\leq a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function <math> F^{-1}(\cdot) </math> and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/><br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math>, Since in discrete cases, F(x) is not continuous.<br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
The answer is:<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.5 \\<br />
1, & \text{if } 0.5 < U \leq 1<br />
\end{cases}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
0, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}</math><br />
The ratio is <math>\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = \, p (1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2><br />
<p><b>Problem:</b>generate types of distribution.</p><br />
<p><b>Plan:</b></p><br />
<b style='color:lightblue;'>Continuous case:</b><br />
<ol><br />
<li>find CDF F</li><br />
<li>find the inverse F<sup>-1</sup></li><br />
<li>Generate a list of uniformly distributed number {x}</li><br />
<li>{F<sup>-1</sup>(x)} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;u=rand(1,1000);<br />
&gt;&gt;hist(u)<br />
&gt;&gt;x=(-log(1-u))/2;<br />
&gt;&gt;size(x) <br />
&gt;&gt;figure<br />
&gt;&gt;hist(x)<br />
</pre><br />
<br><br />
<b style='color:lightblue'>Discrete case:</b><br />
<ol><br />
<li>generate a list of uniformly distributed number {u}</li><br />
<li>d<sub>i</sub>=x<sub>i</sub> if<math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math></li><br />
<li>{d<sub>i</sub>=x<sub>i</sub>} is what we want</li><br />
</ol><br />
<b>Matlab Instruction</b><br />
<pre style="font-size:16px">&gt;&gt;for ii=1:1000<br />
u=rand;<br />
if u&lt;0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
&gt;&gt;hist(x)<br />
</pre><br />
</div><br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practise, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. <br />
<br />
<br><br />
<br />
Note: 1. Value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more. <br><br />
2. Value around x<sub>2</sub>: number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.<br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Theorem ===<br />
<br />
Let <math>f: \R \rightarrow [0,+\infty]</math> be a well-defined pdf, and <math>\displaystyle Y</math> be a random variable with pdf <math>g: \R \rightarrow [0,+\infty]</math> such that <math>\exists c \in \R^+</math> with <math>f \leq c \cdot g</math>. If <math>\displaystyle U \sim~ U(0,1)</math> is independent of <math>\displaystyle Y</math>, then the random variable defined as <math>X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}</math> has pdf <math>\displaystyle f</math>, and the condition <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> is denoted by "Accepted".<br />
<br />
=== Proof ===<br />
(to be updated later)<br><br />
<br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>\begin{align}<br />
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}<br />
\end{align}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
'''Comments:'''<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br/><br />-'''Note:''' When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br/>An example would be when the target function (f) has a spike or several spikes in its domain - this would force the known distribution (g) to have density at least as large as the spikes, making the value of c larger than desired. As a result, the algorithm would be highly inefficient.<br />
<br />
'''Acceptance-Rejection Method'''<br/><br />
'''Example 1''' (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
<math>f(x)=Pr(X=x)=2Cx*(0.5)^2</math><br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(cg(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need <math>c>=f(x)/g(x)</math><br/><br />
We need <math>c=3/2</math><br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate <math>u,v~U(0,1)</math><br/><br />
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If <math>(y=0)</math> and <math>(v<1/2), output=0</math> <br/><br />
If <math>(y=2) </math> and <math>(v<1/2), output=2 </math><br/><br />
Else if <math>y=1, output=1</math><br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x)/(cg(x))</math> is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let <math>f(x)</math> be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let <math>g(x)</math> be the helper function <br/><br />
Let <math>kg(x)>=f(x)</math><br/><br />
Since we need to generate y from <math>g(x)</math>,<br/><br />
<math>Pr(select y)=g(y)</math><br/><br />
<math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/><br />
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, <math>E(X)=1/(1/c))=c</math> <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(cg(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
<br />
an example to show how to figure out c and f(x)/c*g(x).<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br> The area of the f(x) is under the area of the g(x).<br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
*The constant c can not be negative number.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/><br />
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/><br />
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/><br />
<br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>.<br />
And it means c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math> <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller the c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)\leq c\cdot g(x)</math><br><br />
<math>c\geq \frac{f(x)}{g(x)}</math><br><br />
<math>c = \max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = \max \frac{2x}{1}, 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
:'''*Note4:''' If c=1, we will accept all points, which is the ideal situation.<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(-1,1) and g(x)=1/2<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, <br />
c=max 2*\frac{3}{4} (1-x^2) = 3/2 </math><br />
<br />
The process:<br />
<br />
:1: Draw U1 ~ U(0,1) <br><br />
:2: Draw U2~U(0,1) <br><br />
:3: let <math> y = U1*2 - 1 </math><br />
:4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{2}} = \frac{1-y^2}{2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/2) is getting from f(y) / (cg(y)) )<br />
:5: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
We can also use <math>g(x)=2x</math> for a more efficient algorithm<br />
<br />
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>.<br />
Use the inverse method to sample from <math>g(x)</math><br />
<math>G(x)=x^2</math>.<br />
Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br><br />
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1<br />
<br />
<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. In the example we did in class relating the <math>f(x)=2x</math>, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
<math>X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) </math><br><br />
<math>\vert Z \vert</math> has probability density function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
<p style="font-weight:bold;text-size:20px;">How to transform <math>U(0,1)</math> to <math>U(a, b)</math></p><br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(a,b)</math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[cg(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = y </math><br />
else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>cg(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get <math>a= \frac {1}{2}</math> <br/><br />
<b><math>c=\frac{4}{e}</math></b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let <math>h(x) = \frac {f(x)}{g(x)}</math>, and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x<sub>1</sub>;<br /><br />
2. Plug x<sub>1</sub> into h(x) and get the value(or a function) of c, denote as c<sub>1</sub>;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c<sub>1</sub> is a value, then we can ignore this step) Since we want the smallest value of c such that <math>f(x) \leq c\cdot g(x)</math> for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c<sub>1</sub> with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c<sub>1</sub>. (Double check that <math>c_1 \geq 1</math><br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For the two examples above, we need to generate the probability function to uniform distribution,<br />
and figure out <math>c=max\frac {f(y)}{g(y)} </math>.<br />
If <math>v<\frac {f(y)}{c\cdot g(y)}</math>, output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/><br />
4) A uniform draw<br/><br />
<br />
----<br />
<br />
== Interpretation of 'C' ==<br />
We can use the value of c to calculate the acceptance rate by '1/c'.<br />
<br />
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (1/1.5=0.667).<br />
<br />
== Class 5 - Tuesday, May 21 ==<br />
Recall the example in the last lecture. The following code will generate a random variable required by the question in that question.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>ii=1;<br />
>>R=1; #Note: that R is a constant in which we can change <br />
i.e. if we changed R=4 then we would have a density between -4 and 4<br />
>>while ii<1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)>=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1; #Note: for beginner programers that this step increases <br />
the ii value for next time through the while loop<br />
end<br />
end<br />
>>hist(x,20)<br />
</pre><br />
<br />
<br />
<br />
MATLAB tips: hist(x,y)- y means the number of bars in the graph.<br />
<br />
[[File:ARM_cont_example.jpg|300px]]<br />
<br />
a histogram to show variable x, and the bars number is y.<br />
=== Discrete Examples ===<br />
* '''Example 1''' <br><br />
Generate random variable <math>X</math> according to p.m.f<br/><br />
<math>\begin{align}<br />
P(x &=1) &&=0.15 \\<br />
P(x &=2) &&=0.25 \\<br />
P(x &=3) &&=0.3 \\<br />
P(x &=4) &&=0.1 \\<br />
P(x &=5) &&=0.2 \\<br />
\end{align}</math><br/><br />
<br />
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we can already easily generate a discrete random variable Y with pmf g(x)=P(Y=x)such that sup<sub>x</sub> {f(x)/g(x)}<= c < ∞.<br />
The following algorithm yields our X:<br />
<br />
Step 1. Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/><br />
Step 2. Draw <math>U \sim~ U(0,1)</math>.<br/><br />
Step 3. If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/><br />
Else return to Step 1.<br/><br />
<br />
How do we compute c? Recall that c can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. Note that this is different from maximizing <math> f(x) </math> and <math> g(x) </math> independently of each other and then taking the ratio to find c.<br />
:<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math><br />
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br><br />
Note: The U is independent from y in Step 2 and 3 above.<br />
~The constant c is a indicator of rejection rate<br />
<br />
the acceptance-rejection method of pmf, the uniform pro is the same for all variables, and there 5 parameters(1,2,3,4,5), so g(x) is 0.2<br />
<br />
* '''Code for example 1'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.15 .25 .3 .1 .2]; #This a vector holding the values<br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(5);<br />
u=rand;<br />
if u<= p(y)/0.3<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:ARM_disc_example.jpg|300px]]<br />
<br />
unidrnd(k) draws from the discrete uniform distribution of integers <math>1,2,3,...,k</math> If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function. <br />
<br />
The acceptance rate is <math>1/c</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. <br />
<br />
For example, if c = 1.5, the acceptance rate would be <math>1/1.5=2/3</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. <br />
<br />
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.<br />
Recall 1/c is rejection ratio, more smaller more better.<br />
<br />
* '''Example 2'''<br><br />
p(x=1)=0.1<br />p(x=2)=0.3<br />p(x=3)=0.6<br /><br />
Let g be the uniform distribution of 1, 2, or 3<br /><br />
<math>c=max(p_{x}/g(x))=0.6/(1/3)=1.8</math><br /><br />
1,y~g<br /><br />
2,u~U(0,1)<br /><br />
3, If <math>U \leq \frac{f(y)}{cg(y)}</math>, set x = y. Else go to 1.<br />
<br />
* '''Code for example 2'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>p=[.1 .3 .6]; <br />
>>ii=1;<br />
>>while ii < 1000<br />
y=unidrnd(3);<br />
u=rand;<br />
if u<= p(y)/1.8<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
<br />
<br />
* '''Example 3'''<br><br />
<math>p_{x}=e^{-3}3^{x}/x! , x>=0</math><br><br />
Try the first few p_{x}'s: .0498 .149 .224 .224 .168 .101 .0504 .0216 .0081 .0027<br><br />
<br />
Use the geometric distribution for <math>g(x)</math>;<br><br />
<math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br><br />
Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br><br />
We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br><br />
<br />
1. Generate <math>U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)</math><br><br />
2. <math>j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor;</math><br><br />
3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set X = x<sub>j</sub>, else go to step 1.<br />
<br />
<br />
*'''Example 4''' (Hypergeometric & Binomial)<br> <br />
<br />
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement. <br />
<br />
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c<br />
<br />
Solution:<br />
For hypergeometric: <math>P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,</math><br><br><br />
<math>P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198</math><br />
<br />
<br />
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704<br />
<br />
Find the value of f/g for each X<br />
<br />
X=0: 0.8898; <br />
X=1: 1.1127; <br />
X=2: 0.9891; <br />
X=3: 0.5934<br />
<br />
Choose the maximum which is [[c=1.1127]]<br />
<br />
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127.<br />
But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.<br />
<br />
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations. <br><br />
Here is an example: <br />
[[File:ARM_Fail.jpg]]<br />
<br />
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.<br />
The higher the rejection rate, more points will be rejected.<br> <br />
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br><br />
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"><br />
<h2 style="text-align:center;">Acceptance-Rejection Method</h2><br />
<p><b>Problem:</b> The CDF is not invertible or it is difficult to find the inverse.</p><br />
<p><b>Plan:</b></p><br />
<ol><br />
<li>Draw y~g(.)</li><br />
<li>Draw u~Unif(0,1)</li><br />
<li>If <math>u\leq \frac{f(y)}{cg(y)}</math>then set x=y. Else return to Step 1</li><br />
</ol><br />
<p>x will have the desired distribution.</p><br />
<b>Matlab Example</b><br />
<pre style="font-size:16px">close all<br />
clear all<br />
ii=1;<br />
R=1;<br />
while ii&lt;1000<br />
u1 = rand;<br />
u2 = rand;<br />
y = R*(2*u2-1);<br />
if (1-u1^2)&gt;=(2*u2-1)^2<br />
x(ii) = y;<br />
ii = ii + 1;<br />
end<br />
end<br />
hist(x,20)<br />
</pre><br />
</div><br />
<br />
<br />
Recall that,<br />
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportinal to p(Y)/q(Y).<br />
Specifically, let c be a constant such that <br />
p(j)/q(j)<=c for all j such that p(j)>0<br />
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.<br />
<br />
=== Sampling from commonly used distributions ===<br />
<br />
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.<br />
<br />
* '''Gamma'''<br /><br />
<br />
The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br><br />
<math> F(x) = \int_0^{\lambda x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br><br />
<br />
Neither Inverse Transformation nor Acceptance/Rejection Method can be easily applied to Gamma distribution.<br />
However, we can use additive property of Gamma distribution to generate random variables.<br />
<br />
* '''Additive Property'''<br /><br />
If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br />
<br />
If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up.<br />
<br />
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br />
<math> x_1 </math>~Exp(<math>\lambda </math>)<br />
<math>x_2 </math>~Exp(<math> \lambda </math>)<br />
...<br />
<math>x_t </math>~Exp(<math> \lambda </math>)<br />
<math>x_1+x_2+...+x_t</math><br />
<br />
<pre style="font-size:16px"><br />
>>l=1<br />
>>u-rand(1,1000);<br />
>>x=-(1/l)*log(u); <br />
>>hist(x)<br />
>>rand<br />
</pre><br />
<br />
<br />
* '''Procedure '''<br />
<br />
:#Sample independently from a uniform distribution <math>t</math> times, giving <math> U_1,\dots,U_t \sim~ U(0,1)</math> <br />
:#Use the Inverse Transform Method, <math> X_i = -\frac {1}{\lambda}\log(1-U_i)</math>, giving <math> X_1,\dots,X_t \sim~Exp(\lambda)</math><br />
:#Use the additive property,<math> X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math><br><br />
<br />
<br />
* '''Note for Procedure '''<br />
:#If <math>U\sim~U(0,1)</math>, then <math>U</math> and <math>1-U</math> will have the same distribution (both follows <math>U(0,1)</math>)<br />
:#This is because the range for <math>1-U</math> is still <math>(0,1)</math>, and their densities are identical over this range.<br />
:#Let <math>Y=1-U</math>, <math>Pr(Y<=y)=Pr(1-U<=y)=Pr(U>=1-y)=1-Pr(U<=1-y)=1-(1-y)=y</math>, thus <math>1-U\sim~U(0,1)</math><br />
<br />
<br />
<br />
* '''Some notes on matlab coding: '''<br/ ><br />
If X is a matrix; <br /><br />
:*: ''X(1,:)'' returns the first row <br/ ><br />
:*: ''X(:,1)'' returns the first column <br/ ><br />
:*: ''X(i,i)'' returns the (i,i)th entry <br/ ><br />
:*: ''sum(X,1)'' or ''sum(X)'' is a summation of the rows of X <br /><br />
:*: ''sum(X,2)'' is a summation of the columns of X <br/ ><br />
:*: ''rand(r,c)'' will generate random numbers in r row and c columns <br /><br />
:*: Matlab coding language is very efficient with vectors and inefficient with loops. It is far better to use vector operations (use the . operator as necessary) than it is to use "for" loops when computing many values.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>u = rand(20, 1000); Note: this command generate a 20x1000 matrix <br />
(which means we generate 1000 number for each X_i with t=20); <br />
all the elements are generated by rand<br />
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) <br />
>>xx = sum(x) Note: sum(x) will sum all elements in the same column. <br />
size(xx) can help you to verify<br />
>>hist(xx)<br />
</pre><br />
[[File:Gamma_example.jpg|300px]]<br />
<br />
size(x) and size(u) are both 20*1000 matrix.<br />
Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitue 1-u with u to simply the equation.<br />
Alternatively, the following command will do the same thing with the previous commands.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>lambda = 1;<br />
>>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. <br />
Here we can use either log(u) or log(1-u) since U~U(0,1);<br />
>>hist(xx)<br />
</pre><br />
<br />
in the matrix rand(20,1000) means 20 row with 1000 numbers for each.<br />
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram.<br />
<br />
=== Other Sampling Method: Coordinate System ===<br />
[[File:Unnamed_QQ_Screenshot20130521203625.png]]<br />
* From cartesian to polar coordinates <br /><br />
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /><br />
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /><br />
<br />
<br />
if the graph is straight line, we can set the length of the line is R, and x=cos(sigma) , y=sin(sigma)<br />
<br />
=== '''Matlab''' ===<br />
<br />
----<br />
<pre style="color:red; font-size:30px"><br />
THIS SECTION MAY BE REDUNDANT.<br />
PLEASE COMBINE WITH "Some notes on matlab coding"<br />
IN SECTION 6.2<br />
</pre><br />
<br />
'''X=rand(2,3)''' generates a 2 rows*3 columns matrix<br /><br />
Example:<br /><br />
0.1 0.2 0.3<br /><br />
0.4 0.5 0.6<br /><br />
'''sum(X)''' adds the columns up<br /><br />
Example:<br /><br />
0.5 0.7 0.9<br /><br />
'''sum(X,2)''' adds up the rows<br /><br />
Example:<br /><br />
0.6<br /><br />
1.5<br /><br />
<br />
== Class 6 - Thursday, May 23 ==<br />
<br />
=== Announcement ===<br />
1.On the day of each lecture, students from the morning section can only contribute the first half of the lecture (i.e. 8:30 - 9:10 am), so that the second half can be saved for the ones from the afternoon section. After the day of lecture, students are free to contribute anything.<br />
<br />
=== Standard Normal distribution ===<br />
If X ~ N(0,1)- Standard Normal Distribution - then its p.d.f. is of the form<br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
<br />
*Warning : the General Normal distribution is <br />
:<math><br />
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }<br />
</math><br />
where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /><br />
<br /><br />
*N(0,1) is standard normal. <math> \mu </math> =0 and <math> \sigma </math>=1 <br /><br />
<br /><br />
<br />
Let X and Y be independent standard normal.<br />
<br />
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) <br />
<br />
Note: R must satisfy two properties:<br />
<br />
:1. Be a positive number (as it is a length)<br />
<br />
:2. It must be from a distribution that has more data points closer to the origin so that as we go further from the origin, less points are generated (the two options are Chi-squared and Exponential distribution) <br />
<br />
The form of the joint distribution of R and <math>\theta</math> will show that the best choice for distribution of R<sup>2</sup> is exponential.<br />
<br />
<br />
We cannot use the Inverse Transformation Method since F(x) does not have a closed form solution. So we will use joint probability function of two independent standard normal random variables and polar coordinates to simulate the distribution:<br />
<br />
We know that <br />
<br />
:R<sup>2</sup>= X<sup>2</sup>+Y<sup>2</sup><br />
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math><br />
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math><br />
:<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since both the distributions are independent<br />
It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by,<br />
1-1 transformation:<br /><br />
Let <math>d=R^2</math><br /><br />
<math>x= \sqrt {d}\cos \theta </math><br />
<math>y= \sqrt {d}\sin \theta </math><br />
then <br />
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\dfrac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math><br />
It can be shown that the pdf of <math> d </math> and <math> \theta </math> is:<br />
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math><br />
<br />
<br />
<br />
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent<br />
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math><br />
::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math><br />
::* <math> \tan(\theta) = \frac{y}{x} </math><br />
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> <br />
<br><br />
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math><br />
<br><br />
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /><br />
1) Generating their polar coordinates<br /><br />
2) Transforming back to rectangular (Cartesian) coordinates.<br /><br />
==== Expectation of a Standard Normal distribution ====<br />
The expectation of a standard normal distribution is 0<br />
:Below is the proof: <br />
<br />
:<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math><br />
:<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math><br />
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math><br />
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'')<br />
:<math>=\;\ - \int_{-\infty}^{\infty} \phi'(x), dx.</math><br />
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math><br />
:<math>= 0</math><br />
<br />
* '''Procedure (Box-Muller Transformation Method):''' <br /><br />
Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique had the easy of use and accuracy that grew more valuable as computers became more computationally astute since then.<br />
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br />
if Z = (Z<sub>1</sub>, Z<sub>2</sub>) has this distribution, then<br />
1.R<sup>2</sup>=Z<sub>1</sub><sup>2</sup>+Z<sub>2</sub><sup>2</sup> is exponentially distributed with mean 2, i.e.<br />
P(R<sup>2</sup> <= x) = 1-e<sup>-x/2</sup>.<br />
2.GivenR<sup>2</sup>, the point (Z<sub>1</sub>,Z<sub>2</sub>) is uniformly distributed on the circle of radius R centered at the origin.<br />
We can use these properties to build the algorithm:<br />
<br />
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /><br />
2) Generate polar coordinates using the exponential distribution of d and uniform distribution of θ,<br />
<br />
<br />
<br />
<math> \begin{align} R^2 = d = -2\log(U_1), & \quad r = \sqrt{d} \\ & \quad \theta = 2\pi U_2 \end{align} </math><br />
<br />
<br />
<math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /><br />
<br />
<br />
3) Transform polar coordinates (i.e. R and θ) back to Cartesian coordinates (i.e. X and Y), <br> <math> \begin{align} x = R\cos(\theta) \\ y = R\sin(\theta) \end{align} </math> <br />.<br />
<br />
Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /><br />
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>u1=rand(1,1000);<br />
>>u2=rand(1,1000);<br />
>>d=-2*log(u1);<br />
>>tet=2*pi*u2;<br />
>>x=d.^0.5.*cos(tet);<br />
>>y=d.^0.5.*sin(tet);<br />
>>hist(tet) <br />
>>hist(d)<br />
>>hist(x)<br />
>>hist(y)<br />
</pre><br />
<br />
"''Remember'': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br />
<br />
[[File:Normal_theta.jpg|300px]][[File:Normal_d.jpg|300px]]<br />
[[File:normal_x.jpg|300x300px]][[File:normal_y.jpg|300x300px]]<br />
<br />
As seen in the histograms above, X and Y generated from this procedure have a standard normal distribution.<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>x=randn(1,1000);<br />
>>hist(x)<br />
>>hist(x+2)<br />
>>hist(x*2+2)<br />
</pre><br />
<br />
Note: randn is random sample from a standard normal distribution.<br /><br />
Note: hist(x+2) will be centered at 2 instead of at 0. <br /><br />
hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /><br />
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]]<br />
<br /><br />
<br />
<b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /><br />
<br />
'''Alternative Methods of generating normal distribution'''<br /><br />
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /><br />
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /><br />
3. '''Ziggurat algorithm''' which is known to be faster than Box-Muller transformation and a version of this algorithm is used for the randn function in matlab.<br /><br />
<br />
If Z~N(0,1) and X= μ +Zσ then X~<math> N(\mu, \sigma^2)</math><br />
<br />
If Z<sub>1</sub>, Z<sub>2</sub>... Z<sub>d</sub> are independent identically distributed N(0,1),<br />
then Z=(Z<sub>1</sub>,Z<sub>2</sub>...Z<sub>d</sub>)<sup>T</sup> ~N(0, I<sub>d</sub>), where 0 is the zero vector and I<sub>d</sub> is the identity matrix.<br />
<br />
For the histogram, the constant is the parameter that affect the center of the graph.<br />
<br />
=== Proof of Box Muller Transformation ===<br />
<br />
Definition:<br />
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution).<br />
<br />
Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,10) random variables. Then<br />
<math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math><br />
<br />
<math>X_{1} = -2lnU_{1}*sin(2\pi U_{2})</math><br />
are '''independent''' N(0,1) random variables.<br />
<br />
This is a standard transformation problem. The joint distribution is given by <br />
f(x1 ,x2) = f<sub>u1</sub>, <sub>u2</sub>(g1^− 1(x1,x2),g2^− 1(x1,x2)) * | J |<br />
<br />
where J is the Jacobian of the transformation,<br />
<br />
J = |∂u<sub>1</sub>/∂x<sub>1</sub>,∂u<sub>1</sub>/∂x<sub>2</sub>|<br />
|∂u<sub>2</sub>/∂x<sub>1</sub>,∂u<sub>2</sub>/∂x<sub>2</sub>|<br />
where <br />
u<sub>1</sub> = g<sub>1</sub> ^-1(x1,x2)<br />
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2)<br />
<br />
Inverting the above transformations, we have<br />
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2}<br />
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>)<br />
<br />
Finally we get<br />
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi<br />
which factors into two standard normal pdfs.<br />
<br />
=== General Normal distributions ===<br />
General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. The pdf of the general normal distribution is <br />
<math>f(x) = 1/ sigma. *phi * ( (x - nu)/ sigma) </math>, where <math>phi(x) = 1/ (2pie)^1/2 .* e ^ (- 1/2 * x^2) </math><br />
<br />
The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then Z = (X − μ)/σ will have a standard normal distribution.<br />
<br />
If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math><br />
<br />
If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math><br />
ie.<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>z1=randn(1,1000); <-generate variable from standard normal distribution<br />
>>z2=randn(1,1000);<br />
>>z=[z1;z2];<br />
>>plot(z(1,:),z(2,:),'.')<br />
</pre><br />
[[File:Nonstdnormal_example.jpg|300px]]<br />
<br />
If Z~N(0,Id) and X= <math>\underline{\mu} + \,2 \Sigma^{\frac{1}{2}} </math> then <math>\underline{X}</math> ~<math>N(\underline{\mu},\Sigma)</math><br />
<br />
=== Bernoulli Distribution ===<br />
The Bernoulli distribution is a discrete probability distribution, which usually describe an event that only has two possible results, i.e. success or failure. If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. <br />
<br />
P ( x = 0) = q = 1 - p<br />
P ( x = 1) = p <br />
P ( x = 0) + P (x = 1) = p + q = 1<br />
<br />
If X~Ber(p), its pdf is of the form <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1<br />
<br> P is the success probability.<br />
<br />
The Bernoulli distribution is a special case of binomial distribution, which the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x only take 0 and 1.<br />
<br />
<pre style="font-size:16px"><br />
<br />
Procedure:<br />
<br />
To simulate the event of flipping a coin, let P be the probability of flipping head and X = 1 and 0 represent<br />
flipping head and tail respectively:<br />
<br />
1. Draw U ~ Uniform(0,1)<br />
<br />
2. If U <= P<br />
<br />
X = 1<br />
<br />
Else<br />
<br />
X = 0<br />
<br />
3. Repeat as necessary<br />
<br />
</pre><br />
<br />
An intuitive way to think of this is in the coin flip example we discussed in a previous lecture. In this example we set p = 1/2 and this allows for 50% of points to be heads or tails.<br />
<br />
* '''Code to Generate Bernoulli(p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
i = 1;<br />
<br />
while (i <=1000)<br />
u =rand();<br />
p = 0.3;<br />
if (u <= p)<br />
x(i) = 1;<br />
else<br />
x(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
hist(x)<br />
</pre><br />
<br />
However, we know that if <math>\begin{align} X_i \sim Bernoulli(p) \end{align}</math> where each <math>\begin{align} X_i \end{align}</math> is independent,<br /><br />
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /><br />
So we can sample from binomial distribution using this property.<br />
Note: For Binomial distribution, we can consider it as a set of n Bernoulli add together.<br />
<br />
<br />
* '''Code to Generate Binomial(n = 10,p = 0.3)'''<br /><br />
<pre style="font-size:16px"><br />
p = 0.3;<br />
n = 10;<br />
<br />
for k=1:5000<br />
i = 1;<br />
while (i <= n)<br />
u=rand();<br />
if (u <= p)<br />
y(i) = 1;<br />
else<br />
y(i) = 0;<br />
end<br />
i = i + 1;<br />
end<br />
<br />
x(k) = sum(y==1);<br />
end<br />
<br />
hist(x)<br />
<br />
</pre><br />
Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1.<br />
<br />
Comments on Matlab:<br />
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. <br />
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. <br />
The Matlab code is 3.*V<br />
<br />
some examples for using code to generate distribution.<br />
<br />
== Class 7 - Tuesday, May 28 ==<br />
<br />
===Universality of the Uniform Distribution/Inverse Method===<br />
Procedure:<br />
<br />
1.Generate U~Unif [0, 1)<br><br />
2.set <math>x=F^{-1}(u)</math><br><br />
3.X~f(x)<br><br />
<br />
Example:<br />
<br />
Let x<sub>1</sub>,x<sub>2</sub> denote the lifetime of 2 independent particles, <math>X</math><sub>1</sub>~<math>Exp(\lambda_1)</math>, <math>X</math><sub>2</sub>~<math>Exp(\lambda_2)</math>.<br><br />
We are interested in Y=min(<math>\lambda_1, \lambda_2</math>).<br><br />
Design an algorithm based on inverse method to generate sample according to fy.<br><br />
<br />
Inversion Method<br />
<br />
P(X<=x) <br />
= P(<math>F^{-1}(u)<=x) <br />
=P(u<=Fx(X))<br />
=Fx(U)</math><br />
U = Fx(X) =><math>x=F^{-1}(u)</math><br><br />
<br />
<br />
<br />
'''Example 1'''<br><br />
Let <math>X</math><sub>1</sub>,<math>X</math><sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br><br />
<br />
We are interested in <math>y=min(X</math><sub>1</sub><math>,X</math><sub>2</sub><math>)</math><br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to <math>f</math><sub>y</sub><math>(y)</math><br><br />
<br />
'''Solution:'''<br />
x~exp(<math>\lambda</math>)<br><br />
<math>f_{x}(x)=\lambda e^{-\lambda x},x\geq0 </math> <br><br />
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = -e^{\, -(\lambda_1 + \lambda_2) y}</math><br><br />
<math>F_Y(y)=1-e^{\, -(\lambda_1 + \lambda_2) y}, y\geq 0</math><br><br />
<math>U=1-e^{\, -(\lambda_1 + \lambda_2) y}</math> => <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br><br />
<br />
'''Procedure:'''<br />
<br />
Step1: Generate u~unif [0, 1)<br><br />
Step2: set <math>x=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(u)</math><br><br />
<br />
If we change the lifetime of two independent particles to n independent particles<br />
<br />
we change <br />
<math>X</math><sub>1</sub>~exp(<math>\lambda</math><sub>1</sub>)<br><br />
<math>X</math><sub>2</sub>~exp(<math>\lambda</math><sub>2</sub>)<br> to<br />
<math>X</math><sub>n</sub>~exp(<math>\lambda</math><sub>n</sub>)<br><br />
<br />
Now, we are looking the '''maximum''' instead of '''minimum'''<br />
<br />
<math>y=\, {-\frac {1}{{ \sum\lambda_i}}} ln(1-u)</math><br><br />
<br />
inverse-transform method to figure out the joint pdf, cdf and inverse it.<br />
<br />
'''Example 2'''<br><br />
Consider U~Unif[0,1)<br><br />
<math>X=\, a (1-\sqrt{1-u})</math>, <br />
where a>0<br><br />
What is the distribution of X?<br><br />
<math>X=\, a (1-\sqrt{1-u})</math><br><br />
=><math>1-\frac {x}{a}=\sqrt{1-u}</math><br><br />
=><math>u=1-(1-\frac {x}{a})^2</math><br><br />
=><math>u=\, {\frac {x}{a}} (2-\frac {x}{a})</math><br><br />
<math>f(x)=\frac {dF(x)}{dx}=\frac {2}{a}-\frac {2x}{a^2}=\, \frac {2}{a} (1-\frac {x}{a})</math><br><br />
<br />
We can define the distribution of X, when we know U~Unif[0,1).<br />
<br />
'''Example 3'''<br><br />
Suppose F<sub>X</sub>(x) = x<sup>n</sup>, 0 ≤ x ≤ 1, n ∈ N > 0. We want to generate X.<br><br />
<br><br />
1. generate u ~ Unif[0, 1)<br><br />
2. Set x <- U<sup>1/n</sup><br><br />
<br><br />
For example, when n = 20,<br><br />
u = 0.6 => x = u<sub>1/20</sub> = 0.974<br><br />
u = 0.5 => x = u<sub>1/20</sub> = 0.966<br><br />
u = 0.2 => x = u<sub>1/20</sub> = 0.923<br><br />
<br><br />
Recall that<br />
If Y = max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>), where X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub> are independent, <br><br />
F<sub>Y</sub>(y) = P(Y ≤ y) = P(max (X<sub>1</sub>, X<sub>2</sub>, ... , X<sub>n</sub>) ≤ y) = P(X<sub>1</sub> ≤ y, X<sub>2</sub> ≤ y, ... , X<sub>n</sub> ≤ y) = F<sub>x<sub>1</sub></sub>(y) F<sub>x<sub>2</sub></sub>(y) ... F<sub>x<sub>n</sub></sub>(y)<br><br />
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> <br />
<br><br />
Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br />
Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method.<br />
<br><br />
<br />
generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5.....<br />
<br />
'''Example 4 (New)'''<br><br />
Let X<sub>1</sub>,X<sub>2</sub> denote the lifetime of two independent particles:<br><br />
<math>\, X_1, X_2 \sim exp(\lambda)</math><br><br />
<br />
We are interested in Z=max(X<sub>1</sub>,X<sub>2</sub>)<br><br />
Design an algorithm based on the Inverse-Transform Method to generate samples according to f<sub>Z</sub>(z)<br><br />
<br />
<math>\, F_Z(z)=P[Z<=z] = F_{X_1}(z) \cdot F_{X_2}(z) = (1-e^{-\lambda z})^2</math><br><br />
<math> \text{thus } F^{-1}(z) = -\frac{1}{\lambda}\log(1-\sqrt z)</math><br><br />
<br />
To sample Z: <br><br />
<math>\, \text{Step 1: Generate } U \sim U[0,1)</math><br><br />
<math>\, \text{Step 2: Let } Z = -\frac{1}{\lambda}\log(1-\sqrt U)</math>, therefore we can generate random variable of Z.<br />
<br />
===Decomposition Method===<br />
<br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math><br />
<br />
<math>f_{X} = \sum_{i=1}^{n}p_{i}f_{X_{i}}(x)</math><br />
<br />
where p<sub>1</sub>, p<sub>2</sub>, ..., p<sub>n</sub> > 0 and sum of p<sub>i</sub> = 1.<br />
<br />
cdf and pmf of discrete distribution Y=(x1,x2,x3,x4....) xi and xj are independent i not equal j.<br />
<br />
=== Examples of Decomposition Method ===<br />
<b>Example 1</b> <br><br />
f(x) = 5/12(1+(x-1)<sup>4</sup>) 0<=x<=2 <br><br />
f(x) = 5/12+5/12(x-1))<sup>4</sup> = 5/6*(1/2)+1/6*(5/2)(x-1))<sup>4</sup> <br><br />
Let f<sub>x1</sub> = 1/2 and f<sub>x2</sub> = 5/2(x-1)<sup>4</sup> <br><br />
<br />
Algorithm: <br />
Generate U~Unif(0,1) <br><br />
If 0<u<5/6, then we sample from f<sub>x1</sub> <br><br />
Else if 5/6<u<1, we sample from f<sub>x2</sub> <br><br />
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br><br />
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br><br />
<br />
divided f(x) to two pdf of x1 and x2, with uniform distribution, of two range of uniform.<br />
<br />
<b>Example 2</b> <br><br />
<math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} \quad for \quad 0\leq x \leq 3 </math> <br><br />
We can rewrite f(x) as <math>f(x)=(\frac{1}{4})*e^{-x}+(\frac{2}{4})*4x+(\frac{1}{4})*\frac{1}{3}</math> <br><br />
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br><br />
Generate U~Unif(0,1)<br><br />
If <math>0<u<\frac{1}{4}</math>, we sample from f<sub>x1</sub> <br><br><br />
If <math>\frac{1}{4}\leq u < \frac{3}{4}</math>, we sample from f<sub>x2</sub> <br><br><br />
Else if <math>\frac{3}{4} \leq u < 1</math>, we sample from f<sub>x3</sub> <br><br />
We can find the inverse CDFs of f<sub>x1</sub> and f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x1</sub> and f<sub>x2</sub> <br><br><br />
We find F<sub>x1</sub> = <math> 1-e^{-x}</math> and F<sub>x2</sub> = <math>2x^{2}</math> <br><br />
We find the inverses are <math> X = -ln(1-u)</math> for F<sub>x1</sub> and <math> X = \sqrt{\frac{U}{2}}</math> for F<sub>x2</sub> <br><br />
Sampling from f<sub>x3</sub> is more straightforward since it is uniform over the interval (0,3) <br><br />
<br />
In general, to write an <b>efficient </b> algorithm for: <br><br />
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br><br />
We would first rearrange <math> {p_i} </math> such that <math> p_i > p_j </math> for <math> i < j </math> <br> <br><br />
Then Generate <math> U</math>~<math>Unif(0,1) </math> <br><br />
If <math> u < p_1 </math> sample from <math> f_1 </math> <br><br />
else if <math> u<p_i </math> sample from <math> f_i </math> for <math> 1<i < n </math><br><br />
else sample from <math> f_n </math> <br><br />
<br />
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1)<br />
<br />
== Example of Decomposition Method ==<br />
<br />
F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, 0<= x<=1<br />
<br />
let U =F<sub>x</sub>(x) = 1/3*x+1/3*x<sup>2</sup>+1/3*x<sup>3</sup>, solve for x.<br />
<br />
P<sub>1</sub>=1/3, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/3,F<sub>x2</sub>(x)= x<sup>2</sup>, <br />
P<sub>3</sub>=1/3,F<sub>x3</sub>(x)= x<sup>3</sup><br />
<br />
Generate U ~ Unif [0,1), V~ Unif [0,1)<br />
<br />
if 0<u<1/3, x=v<br />
else if u<2/3, x=v<sup>1/2</sup><br />
else x= v<sup>1/3</sup><br><br />
<br />
<br />
Matlab Code: <br />
<pre style="font-size:16px"><br />
u=rand<br />
v=rand<br />
if u<1/3<br />
x=v<br />
elseif u<2/3<br />
x=sqrt(v)<br />
else<br />
x=v^(1/3)<br />
end<br />
</pre><br />
===Fundamental Theorem of Simulation===<br />
Consider two shapes, A and B, where B is a sub-shape (subset) of A. <br />
We want to sample uniformly from inside the shape B.<br />
Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. <br />
(Basis of the Accept-Reject algorithm)<br />
<br />
The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br />
<br />
inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br />
<br />
== Practice Example from Lecture 7 ==<br />
<br />
Let X1, X2 denote the lifetime of 2 independent particles, X1 = exp(lambda1), X2 = exp(lambda2) <br />
<br />
We are interested in Y = min(X1, X2)<br />
<br />
Design an algorithm based on the Inverse Method to generate Y<br />
<br />
<math>f_{x_{1}}(x)=\lambda_{1} e^{(-\lambda_{1}x)},x\geq0 \Rightarrow F(x1)=1-e^{(-\lambda_{1}x)}</math><br /><br />
<math>f_{x_{2}}(x)=\lambda_{2} e^{(-\lambda_{2}x)},x\geq0 \Rightarrow F(x2)=1-e^{(-\lambda_{2}x)}</math><br /><br />
<math>then, 1-F(y)=p(min(x_{1},x_{2}) \geq y)=e^{(-(\lambda_{1}+\lambda_{2})y)},F(y)=1-e^{(-(\lambda_{1}+\lambda_{2}) y)}</math>)<br /><br />
<math>u \sim unif[0,1),u = F(x),\geq y = -1/(\lambda_{1}+\lambda_{2})log(1-u)</math><br />
<br />
==Question 2==<br />
<br />
Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math><br />
<br />
Solution:<br />
This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx-1</math><br />
<br />
U<sub>1~Unif[0,1)<br />
<br />
<br />
U<sub>2~Unif[0,1)<br />
<br />
fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math><br />
<br />
<br />
<br />
The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>.<br />
So, <math>c=b*(1/4)^n</math><br />
Algorithm:<br />
1.Draw <math>U_1</math> from <math>U(0, 1)</math>.<math> U_2</math> from <math>U(0, 1)<math><br />
2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br />
then X=U_1<br />
Else return to step 1.<br />
<br />
Discrete Case:<br />
Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /><br />
<br />
Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br><br />
To sample from <math>X</math>, we use the partition method below: <br><br />
<br />
<math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br><br />
<math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /><br />
<math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /><br />
<math>\, \text{Step 4: Return } x</math><br /></div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16972stat340s132013-05-21T05:51:10Z<p>Ysyap: /* Class 4 - Thursday, May 16 */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 2 * 1.1 + 2 mod 2<br /><br />
4.2 = 2 mod 2<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
30 = 2 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
25 = 1 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications.<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. <br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^{1/5}</math>. Therefore, <math>F^{-1} (x) = x^{1/5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^{1/5}</math><br /><br /><br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br /><br />
F(x)= 1-(1-x)^β<br /><br />
u= 1-(1-x)^β<br /><br />
Solve for x:<br /><br />
(1-x)^β = 1-u<br /><br />
1-x = (1-u)^(1/β)<br /><br />
x = 1-(1-u)^(1/β)<br /><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
learnt how to proof the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to figure out other distribution.<br />
The probability of getting a point for circle over the triangle is closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can see the graph to figure what kind of distribution does the graph belong to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for U~U(0,1), we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U<=a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function F<sup>-1</sup>(.) and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
use inverse method to proof cdf X is inf(U), when u~U (0, 1), and we can use inverse function to proof P(X<x) is cdf function.<br />
the example is use unif(0,1) to proof the how to use inverse method to get cdf.<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function:<br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math><br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{1}2xdx = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = e^{-u}*u^x/x! \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = e^{-u}*u^{x+1}/{x+1}! \end{align}</math><br />
The ratio is <math>\begin{align} P_{x+1}/P_x = ... = u/{x+1} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = P_x * {u/{x+1}} \end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = p*(1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x iff g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), otherwise there is a high chance we will end up rejecting our sample.<br />
<br />
<ul>value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).<=More samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more.</ul><br />
<ul>around x<sub>2</sub>:number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.</ul><br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of c*g(x)and f(x),and proof why they have that relationship and where we can use this rule to reject some case.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Proof ===<br />
<br />
We want to show that P(x)(which is original distribution) can be obtained/sampled using a known distribution g(y).<br />
Therefore, mathematically we want to show that:<br /><br />
<math>P(x) = P(y|accepted) = f(y)</math> <br/><br /><br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>P(a|b)=\frac{P(a,b)}{P(b)}</math>, or <math>P(a|b)=\frac{P(b|a)P(a)}{P(b)}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
Comments:<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br />-Note: When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br />
<br />
Acceptance-Rejection Method<br/><br />
Example 1 (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
f(x)=Pr(X=x)=2Cx*(0.5)^2<br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(c*g(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need c>=f(x)/g(x)<br/><br />
We need c=3/2<br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate u,v~U(0,1)<br/><br />
2. Set y=floor(3*u) (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If (y=0) and (v<1/2), output=0 <br/><br />
If (y=2) and (v<1/2), output=2 <br/><br />
Else if y=1, output=1<br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when u < f(x)/(c*g(x)) is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let f(x) be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let g(x) be the helper function <br/><br />
Let kg(x)>=f(x)<br/><br />
Since we need to generate y from g(x),<br/><br />
Pr(select y)=g(y)<br/><br />
Pr(output y|selected y)=Pr(u<f(y)/(c*g(y)))= (y)/(c*g(y)) (Since u~Unif(0,1))<br/><br />
Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, E(X)=1/(1/c))=c <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/c*g(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(c*g(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(c*g(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br><br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want cg(x)>=f(x).<br />
And it means c has to be greater or equal to f(x)/g(x). So the smallest possible c that satisfies the condition is the maximum value of f(x)/g(x) <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the UNIFORM distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)<=(c(g(x))</math><br><br />
<math>c>=\frac{f(x)}{g(x)}</math><br><br />
<math>c = max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = max (2x/1), 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that c*g can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately n*c points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2*y)}{(2*1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= 3/4 (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(0,1) and g(x)=1<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c \geq (3/4)(1-x^2) /1, <br />
c=max (3/4)(1-x^2) = 3/4 </math><br />
<br />
The process:<br />
<br />
:1: Draw y ~ U(0,1) <br><br />
:2: Draw U~U(0,1) <br><br />
:3: if <math>U \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{4}} = 1-y^2</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/4) is getting from f(y) / (cg(y)) )<br />
:4: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2 \leqslant (u_1)^2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
We can also use g(x)=2x for a more efficient algorithm<br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac {3x^2}{2x} = \frac {3x}{2} </math><br />
Use the inverse method to sample from g(x)<br />
<math>G(x)=x^2</math><br />
Generate U from U(0,1) and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2<=3sqrt(u_1)/2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. For example in the example we did in class relating the f(x)=2x, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
X ∼ N(μ,σ<sup>2</sup>); X = σZ + μ; Z ~ N(0,1) <br><br />
|Z| has probability function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
===How to transform <math>U(0,1)</math> to <math>U(a, b)</math>===<br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1) </math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[c*g(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2} x = y </math> else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>c*g(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get a=1/2 <br/><br />
<b>c=4/e</b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If v<f(y)/(c*g(y)), output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let h(x)=f(x)/g(x), and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x1;<br /><br />
2. Plug x1 into h(x) and get the value(or a function) of c, denote as c1;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c1 is a value, then we can ignore this step) Since we want the smallest value of c such that f(x)<= c*g(x) for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c1 with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c1. (Double check that c1>=1)<br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For that two examples, firstly, we need to generate the probability function to uniform distribution.<br />
and figure out c=max(f(y)/g(y))<br />
If v<f(y)/(c*g(y)), output y.<br />
<br />
<br />
'''Summary of when to use the Accept Rejection Method''' <br/><br />
1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/><br />
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/><br />
3) A constant c where f(x)<math>\leq</math> cg(x)<br/><br />
4) A uniform draw<br/></div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16971stat340s132013-05-21T05:45:44Z<p>Ysyap: /* Class 4 - Thursday, May 16 */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 2 * 1.1 + 2 mod 2<br /><br />
4.2 = 2 mod 2<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
30 = 2 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
25 = 1 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications.<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. <br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^{1/5}</math>. Therefore, <math>F^{-1} (x) = x^{1/5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^{1/5}</math><br /><br /><br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br /><br />
F(x)= 1-(1-x)^β<br /><br />
u= 1-(1-x)^β<br /><br />
Solve for x:<br /><br />
(1-x)^β = 1-u<br /><br />
1-x = (1-u)^(1/β)<br /><br />
x = 1-(1-u)^(1/β)<br /><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
learnt how to proof the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to figure out other distribution.<br />
The probability of getting a point for circle over the triangle is closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can see the graph to figure what kind of distribution does the graph belong to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for U~U(0,1), we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U<=a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function F<sup>-1</sup>(.) and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
use inverse method to proof cdf X is inf(U), when u~U (0, 1), and we can use inverse function to proof P(X<x) is cdf function.<br />
the example is use unif(0,1) to proof the how to use inverse method to get cdf.<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function:<br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math><br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{1}2xdx = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = e^{-u}*u^x/x! \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = e^{-u}*u^{x+1}/{x+1}! \end{align}</math><br />
The ratio is <math>\begin{align} P_{x+1}/P_x = ... = u/{x+1} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = P_x * {u/{x+1}} \end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = p*(1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x iff g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), otherwise there is a high chance we will end up rejecting our sample.<br />
<br />
<ul>value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).<=More samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more.</ul><br />
<ul>around x<sub>2</sub>:number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.</ul><br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of c*g(x)and f(x),and proof why they have that relationship and where we can use this rule to reject some case.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Proof ===<br />
<br />
We want to show that P(x)(which is original distribution) can be obtained/sampled using a known distribution g(y).<br />
Therefore, mathematically we want to show that:<br /><br />
<math>P(x) = P(y|accepted) = f(y)</math> <br/><br /><br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>P(a|b)=\frac{P(a,b)}{P(b)}</math>, or <math>P(a|b)=\frac{P(b|a)P(a)}{P(b)}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
Comments:<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br />-Note: When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br />
<br />
Acceptance-Rejection Method<br/><br />
Example 1 (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
f(x)=Pr(X=x)=2Cx*(0.5)^2<br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(c*g(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need c>=f(x)/g(x)<br/><br />
We need c=3/2<br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate u,v~U(0,1)<br/><br />
2. Set y=floor(3*u) (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If (y=0) and (v<1/2), output=0 <br/><br />
If (y=2) and (v<1/2), output=2 <br/><br />
Else if y=1, output=1<br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when u < f(x)/(c*g(x)) is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let f(x) be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let g(x) be the helper function <br/><br />
Let kg(x)>=f(x)<br/><br />
Since we need to generate y from g(x),<br/><br />
Pr(select y)=g(y)<br/><br />
Pr(output y|selected y)=Pr(u<f(y)/(c*g(y)))= (y)/(c*g(y)) (Since u~Unif(0,1))<br/><br />
Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, E(X)=1/(1/c))=c <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/c*g(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(c*g(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(c*g(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br><br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want cg(x)>=f(x).<br />
And it means c has to be greater or equal to f(x)/g(x). So the smallest possible c that satisfies the condition is the maximum value of f(x)/g(x) <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the UNIFORM distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1.<br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)<=(c(g(x))</math><br><br />
<math>c>=\frac{f(x)}{g(x)}</math><br><br />
<math>c = max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = max (2x/1), 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that c*g can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately n*c points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2*y)}{(2*1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= 3/4 (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(0,1) and g(x)=1<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c \geq (3/4)(1-x^2) /1, <br />
c=max (3/4)(1-x^2) = 3/4 </math><br />
<br />
The process:<br />
<br />
:1: Draw y ~ U(0,1) <br><br />
:2: Draw U~U(0,1) <br><br />
:3: if <math>U \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{4}} = 1-y^2</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/4) is getting from f(y) / (cg(y)) )<br />
:4: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2 \leqslant (u_1)^2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
We can also use g(x)=2x for a more efficient algorithm<br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac {3x^2}{2x} = \frac {3x}{2} </math><br />
Use the inverse method to sample from g(x)<br />
<math>G(x)=x^2</math><br />
Generate U from U(0,1) and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2<=3sqrt(u_1)/2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. For example in the example we did in class relating the f(x)=2x, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
X ∼ N(μ,σ<sup>2</sup>); X = σZ + μ; Z ~ N(0,1) <br><br />
|Z| has probability function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
===How to transform <math>U(0,1)</math> to <math>U(a, b)</math>===<br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1) </math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[c*g(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2} x = y </math> else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>c*g(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get a=1/2 <br/><br />
<b>c=4/e</b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If v<f(y)/(c*g(y)), output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let h(x)=f(x)/g(x), and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x1;<br /><br />
2. Plug x1 into h(x) and get the value(or a function) of c, denote as c1;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c1 is a value, then we can ignore this step) Since we want the smallest value of c such that f(x)<= c*g(x) for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c1 with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c1. (Double check that c1>=1)<br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For that two examples, firstly, we need to generate the probability function to uniform distribution.<br />
and figure out c=max(f(y)/g(y))<br />
If v<f(y)/(c*g(y)), output y.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16970stat340s132013-05-21T05:41:03Z<p>Ysyap: /* Discrete Case */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== Midterm ===<br />
Monday June 17 2013 from 2:30-3:30<br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
{| class="wikitable"<br />
|-<br />
! TA<br />
! Day<br />
! Time<br />
! Location<br />
|-<br />
| Lu Cheng<br />
| Monday<br />
| 3:30-5:30 pm<br />
| M3 3108, space 2<br />
|-<br />
| Han ShengSun<br />
| Tuesday<br />
| 4:00-6:00 pm<br />
| M3 3108, space 2<br />
|-<br />
| Yizhou Fang<br />
| Wednesday<br />
| 1:00-3:00 pm<br />
| M3 3108, space 1<br />
|-<br />
| Huan Cheng<br />
| Thursday<br />
| 3:00-5:00 pm<br />
| M3 3111, space 1<br />
|-<br />
| Wu Lin<br />
| Friday<br />
| 11:00-1:00 pm<br />
| M3 3108, space 1<br />
|}<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
i.e taking value from x, we could predict y.<br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''Prerequisite:''' (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
'''Antirequisite:''' CM 361/STAT 341, CS 437, 457 <!--- Moved these down to declutter table of contents ---><br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte Carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
<br />
'''The final exam is going to be closed book and only non-programmable calculators are allowed'''<br />
'''A passing mark must be achieved in the final to pass the course'''<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that sampling activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''', since the result can be reliably calculated using things such as physics and math. In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events.<br />
<br />
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.<br />
<br />
When people do the test for many times, the results will be closed the express values,that makes the trial looks like deterministic, however for each trial, the result is random.<br />
So, it looks like pseudo random numbers.<br />
<br />
=== Mod ===<br />
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, <br />
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, <br />
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function<br />
<math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which means take the remainder after division by m. <br />
<br /><br />
<br /><br />
We say that n is congruent to r mod m if n = mq + r, where m is an integer. <br /><br />
if y = ax + b, then <math>b:=y \mod a</math>. <br /><br />
4.2 = 2 * 1.1 + 2 mod 2<br /><br />
4.2 = 2 mod 2<br /><br />
<br /><br />
For example:<br /><br />
30 = 4 * 7 + 2 mod 7<br /><br />
30 = 2 mod 7<br /><br />
25 = 8 * 3 + 1 mod 3<br /><br />
25 = 1 mod 3<br /><br />
<br />
<br />
'''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function.<br />
<br />
mod can figure out one integer can be divided by another integer with no remainder or not. But both two integer should follow function: n = mq + r. m, r,q n are all integer. and q smaller than q.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform pseudo random numbers. It is also referred to as the '''Linear Congruential Method''' or '''Mixed Congruential Method'''. We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m > 0</math>. ( <math>\mod m</math> means taking the remainder after division by m) Given a "seed"(all integers and an initial value <math>.x_0</math> called '''seed''') <math>.(x_0 \in \N</math>, we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method may also refer to the special case where <math>b=0</math>.<br /><br />
<br />
An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudorandom number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications where high-quality randomness is required. They should not be used for Monte Carlo simulation and cryptographic applications.<br /><br />
<br />
<br />
<br />
'''First consider the following algorithm'''<br /><br />
<math>x_{k+1}=x_{k} \mod m</math><br />
<br />
<br />
'''Example'''<br /><br />
<math>\text{Let }x_{k}=10,\,m=3</math><br //><br />
<br />
:<math>\begin{align}<br />
<br />
x_{1} &{}= 10 &{}\mod{3} = 1 \\<br />
<br />
x_{2} &{}= 1 &{}\mod{3} = 1 \\<br />
<br />
x_{3} &{}= 1 &{}\mod{3} =1 \\<br />
\end{align}</math><br />
<math>\ldots</math><br /><br />
<br />
Excluding x0, this example generates a series of ones. In general, excluding x0, the algorithm above will always generate a series of the same number less than m. Hence, it has a period of 1. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /><br />
<br />
<br />
'''Multiplicative Congruential Algorithm'''<br /><br />
<math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math><br />
<br />
'''Example'''<br /><br />
<math>\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10</math><br /><br />
<math>\begin{align}<br />
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\<br />
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\<br />
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\<br />
\end{align}</math><br /><br />
<math>\ldots</math><br /><br />
<br />
This example generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(If we choose the numbers properly, we could get a sequence of "random" numbers. However, how do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher possibility people get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed in the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> <br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start, you need to clear all existing defined variables and operations:<br /><br />
<pre style="font-size:16px"><br />
>>clear all<br />
>>close all<br />
</pre><br />
<br />
<pre style="font-size:16px"><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /><br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
''(Note on MATLAB commands: <br /><br />
1. clear all: clears all variables.<br /><br />
2. close all: closes all figures.<br /><br />
3. who: displays all defined variables.<br /><br />
4. clc: clears screen.)<br /><br /><br />
<br />
<pre style="font-size:16px"><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <math>\mod m</math> means taking the remainder after division by <math>m</math>. <!-- This paragraph seems redundant as it is mentioned above. --><br /><br />
<br />
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not looks uniformly distributed.<br /><br />
<br />
Note: hist(x) will generate a graph about the distribution. Use it after run the code to check the real sample distribution.<br />
<br />
'''Example''': <math>a=13, b=0, m=31</math><br /><br />
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so '''it is important to choose <math>m</math> large''' to decrease the probability of each number repeating itself too early. Values are between <math>0</math> and <math>m-1</math>. If the values are normalized by dividing by <math>m-1</math>, then the results are '''approximately''' numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. <br /><br />
<br />
If <math>x_0=1</math>, then <br /><br />
:<math>x_{k+1} = 13x_{k}\mod{31}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align} x_{0} &{}= 1 \\<br />
<br />
x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\<br />
<br />
x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\<br />
<br />
x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\<br />
\end{align}</math><br />
<br />
etc.<br />
<br />
For example, with <math>a = 3, b = 2, m = 4, x_0 = 1</math>, we have:<br />
<br />
:<math>x_{k+1} = (3x_{k} + 2)\mod{4}</math><br />
<br />
So,<br />
<br />
:<math>\begin{align}<br />
x_{0} &{}= 1 \\<br />
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\<br />
\end{align}</math><br /><br />
<br />
etc.<br />
<hr/><br />
<p style="color:red;font-size:16px;">FAQ:</P><br />
1.Why in the example above is 1 to 30 not 0 to 30?<br><br />
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br><br />
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br><br />
2.Will the number 31 ever appear?Is there a probability that a number never appears? <br><br />
''The number 31 will never appear. When you perform the operation <math>\mod m</math>, the largest possible answer that you could receive is <math>m-1</math>. Whether or not a particular number in the range from 0 to <math>m - 1</math> appears in the above algorithm will be dependent on the values chosen for <math>a, b</math> and <math>m</math>. ''<br />
<hr/><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If <math>x_0=3</math> and <math>x_n=(5x_{n-1}+7)\mod 200</math>, find <math>x_1,\cdots,x_{10}</math>.<br /><br />
'''Solution:'''<br /><br />
<math>\begin{align}<br />
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\<br />
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\<br />
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\<br />
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\<br />
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\<br />
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\<br />
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\<br />
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\<br />
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\<br />
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\<br />
\end{align}</math><br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /><br />
<br />
From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values generated will not repeat after several iterations.<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /><br />
<br />
<br />
<br />
<br />
this part i learnt how to use R code to figure out the relationship between two ingeter<br />
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to easily use this method in generating pseudorandom numbers, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable inverse <math>F^{-1}</math>.<br /><br />
<br />
'''Theorem''': <br /><br />
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).<br />
Let <math>F:\R \rightarrow \left[0,1\right]</math> be a cdf. If <math>U \sim U\left[0,1\right]</math>, then the random variable given by <math>X:=F^{-1}\left(U\right)</math><br />
follows the distribution function <math>F\left(\cdot\right)</math>,<br />
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /><br />
'''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case.<br />
<br />
'''Proof of the theorem:'''<br /><br />
The generalized inverse satisfies the following: <br /><br />
<math>\begin{align}<br />
\forall u \in \left[0,1\right], \, x \in \R, \\<br />
&{} F^{-1}\left(u\right) \leq x &{} \\<br />
\Rightarrow &{} F\Big(F^{-1}\left(u\right)\Big) \leq F\left(x\right) &&{} F \text{ is non-decreasing} \\<br />
\Rightarrow &{} F\Big(\inf \{y \in \R | F(y)\geq u \}\Big) \leq F\left(x\right) &&{} \text{by definition of } F^{-1} \\<br />
\Rightarrow &{} \inf \{F(y) \in [0,1] | F(y)\geq u \} \leq F\left(x\right) &&{} F \text{ is right continuous and non-decreasing} \\<br />
\Rightarrow &{} u \leq F\left(x\right) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \in \{y \in \R | F(y) \geq u\} &&{} \\<br />
\Rightarrow &{} x \geq \inf \{y \in \R | F(y)\geq u \}\Big) &&{} \text{by definition of } \inf \\<br />
\Rightarrow &{} x \geq F^{-1}(u) &&{} \text{by definition of } F^{-1} \\<br />
\end{align}</math><br />
<br />
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /><br />
<br />
Finally, <math>P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)</math>, since <math>U</math> is uniform on the unit interval.<br /><br />
<br />
This completes the proof.<br /><br />
<br />
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /><br />
<br />
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. <br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example''': <br />
<math> X= a + (b-a),</math> U is uniform on [a, b] <br /><br />
<math> x=\frac{-ln(U)}{\lambda}</math> is exponential with parameter <math> {\lambda} </math> <br /><br /><br />
'''Example 2''':<br />
Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /><br />
Sol: <br />
Let <math>y=x^5</math>, solve for x: <math>x=y^{1/5}</math>. Therefore, <math>F^{-1} (x) = x^{1/5}</math><br /><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
<math>x= u^{1/5}</math><br /><br /><br />
'''Example 3''':<br />
Given u~U[0,1], generate x from BETA(1,β)<br /><br />
Solution:<br /><br />
F(x)= 1-(1-x)^β<br /><br />
u= 1-(1-x)^β<br /><br />
Solve for x:<br /><br />
(1-x)^β = 1-u<br /><br />
1-x = (1-u)^(1/β)<br /><br />
x = 1-(1-u)^(1/β)<br /><br /><br />
<br />
'''Example 4-Estimating pi''':<br />
Let's use rand() and Monte Carlo Method to estimate <math>pi</math> <br /><br />
N= total number of points <br /><br />
Nc = total number of points inside the circle<br /><br />
Prob[(x,y) lies in the circle]=<math>Area of circle/Area of Square</math><br /><br />
If we take square of size 2, circle will have area pi.<br /><br />
Thus pi= <math>4*(Nc/N)</math><br /><br />
<br />
'''Matlab Code''':<br />
<br />
<pre style="font-size:16px"><br />
>>N=10000;<br />
>>Nc=0;<br />
>>a=0;<br />
>>b=2;<br />
>>for t=1:N<br />
x=a+(b-a)*rand();<br />
y=a+(b-a)*rand();<br />
if (x-1)^2+(y-1)^2<=1<br />
Nc=Nc+1;<br />
end<br />
end<br />
>>4*(Nc/N)<br />
ans = 3.1380<br />
</pre><br />
<br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre style="font-size:16px"><br />
>>u=rand(1,1000);<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre style="font-size:16px"><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.<br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /><br />
<br />
learnt how to proof the cdf transfer to inverse cdf,and use the uniform distribution to obtain a value of x from F(x).<br />
We also can use uniform distribution in inverse mothed to figure out other distribution.<br />
The probability of getting a point for circle over the triangle is closed uniform distribution, each point in the circle and over the triangle is almost the same.<br />
and we can see the graph to figure what kind of distribution does the graph belong to.<br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre style="font-size:16px"><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
<br />
[[File:Disttool.jpg|450px]]<br />
change the value of mu and sigma can change the graph skew side.<br />
<br />
== (Generating random numbers continue) Class 3 - Tuesday, May 14 ==<br />
=== Recall the Inverse Transform Method ===<br />
'''1. Draw U~U(0,1) ''' <br /><br />
'''2. X = F<sup>-1</sup>(U) '''<br /><br />
<br />
<br />
'''Proof''' <br /><br />
First note that<br />
<math>P(U\leq a)=a, \forall a\in[0,1]</math> <br /><br />
<br />
:<math>P(X\leq x)</math> <br /><br />
<math>= P(F^{-1}(U)\leq x)</math> (since <math>U</math> has a uniform distribution)<br /><br />
<math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F(\cdot )</math> is monotonically increasing) <br /><br />
<math>= P(U\leq F(x)) </math> <br /><br />
<math>= F(x) </math> <br /><br />
<br />
This is the c.d.f. of X. <br /><br />
<br /><br />
<br />
'''Note''': that the CDF of a U(a,b) random variable is:<br />
:<math><br />
F(x)= \begin{cases}<br />
0 & \text{for }x < a \\[8pt]<br />
\frac{x-a}{b-a} & \text{for }a \le x < b \\[8pt]<br />
1 & \text{for }x \ge b<br />
\end{cases}<br />
</math> <br />
<br />
Thus, for U~U(0,1), we have <math>P(U\leq 1) = 1</math> and <math>P(U\leq 1/2) = 1/2</math>.<br /><br />
More generally, we see that <math>P(U\leq a) = a</math>.<br /><br />
For this reason, we had <math>P(U\leq F(x)) = F(x)</math>.<br /><br />
<br />
'''Reminder: ''' <br /> <br />
'''This is only for uniform distribution <math> U~ \sim~ Unif [0,1] </math> '''<br /><br />
<math> P (U \le 1) = 1 </math> <br /><br />
<math> P (U \le 0.5) = 0.5 </math> <br /><br />
<br />
[[File:2.jpg]] <math>P(U<=a)=a</math><br />
<br />
LIMITATIONS OF THE INVERSE TRANSFORM METHOD<br />
<br />
Though this method is very easy to use and apply, it does have disadvantages/limitations:<br />
<br />
1. We have to find the inverse c.d.f function F<sup>-1</sup>(.) and make sure it is monotonically increasing, in some cases this function does not exist<br />
<br />
2. For many distributions such as Gaussian, it is too difficult to find the inverse cdf function , making this method inefficient<br />
<br />
use inverse method to proof cdf X is inf(U), when u~U (0, 1), and we can use inverse function to proof P(X<x) is cdf function.<br />
the example is use unif(0,1) to proof the how to use inverse method to get cdf.<br />
<br />
=== Discrete Case ===<br />
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function:<br />
In general in the discrete case, we have <math>x_0, \dots , x_n</math> where:<br />
<br />
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math><br />
:<math>x_0 \leq x_1 \leq x_2 \dots \leq x_n</math><br />
:<math>\sum p_i = 1</math><br />
<br />
Algorithm for applying Inverse Transformation Method in Discrete Case:<br><br />
1: Generate <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math> X=x_i, </math> if <math> F(x_{i-1})<U\leq F(x_i) </math><br><br />
<br />
<br />
'''Example in class:''' (Coin Flipping Example)<br /><br />
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. <br />
<br />
We can define the U function so that: <br />
<br />
If U <= 0.5, then X = 0<br />
<br />
and if 0.5 < U <= 1, then X =1. <br />
<br />
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.<br />
<br />
<math> U~ \sim~ Unif [0,1] </math> <br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.5\\<br />
P(X = 1) &{}= 0.5\\<br />
\end{align}</math><br />
<br />
<br />
* '''Code'''<br /><br />
<pre style="font-size:16px"><br />
>>for ii=1:1000<br />
u=rand;<br />
if u<0.5<br />
x(ii)=0;<br />
else<br />
x(ii)=1;<br />
end<br />
end<br />
>>hist(x)<br />
</pre><br />
[[File:Coin_example.jpg|300px]]<br />
<br />
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.<br />
<br />
'''Example in class:'''<br />
<br />
Suppose we have the following discrete distribution:<br />
<br />
:<math>\begin{align}<br />
P(X = 0) &{}= 0.3 \\<br />
P(X = 1) &{}= 0.2 \\<br />
P(X = 2) &{}= 0.5<br />
\end{align}</math><br />
[[File:33.jpg]]<br />
<br />
The cumulative distribution function (cdf) for this distribution is then:<br />
<br />
:<math><br />
F(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
0.3, & \text{if } x < 1 \\<br />
0.5, & \text{if } x < 2 \\<br />
1, & \text{if } x \ge 2<br />
\end{cases}</math><br />
<br />
Then we can generate numbers from this distribution like this, given <math>U \sim~ Unif[0, 1]</math>:<br />
<br />
:<math><br />
x = \begin{cases}<br />
0, & \text{if } U\leq 0.3 \\<br />
1, & \text{if } 0.3 < U \leq 0.5 \\<br />
2, & \text{if } 0.5 <U\leq 1<br />
\end{cases}</math><br />
<br />
"Procedure"<br /><br />
1. Draw U~u (0,1)<br /><br />
2. if U<=0.3 deliver x=0<br /><br />
3. else if 0.3<U<=0.5 deliver x=1<br /><br />
4. else 0.5<U<=1 deliver x=2<br />
<br />
<br />
* '''Code''' (as shown in class)<br /><br />
Use Editor window to edit the code <br /><br />
<pre style="font-size:16px"><br />
>>close all<br />
>>clear all<br />
>>for ii=1:1000<br />
u=rand;<br />
if u<=0.3<br />
x(ii)=0;<br />
elseif u<0.5<br />
x(ii)=1;<br />
else<br />
x(ii)=2;<br />
end<br />
end<br />
>>size(x)<br />
>>hist(x)<br />
</pre><br />
[[File:Discrete_example.jpg|300px]]<br />
<br />
'''Example''': Generating a random variable from pdf <br><br />
:<math><br />
f_{x}(x) = \begin{cases}<br />
2x, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } otherwise<br />
\end{cases}</math><br />
<br />
:<math><br />
F_{x}(x) = \begin{cases}<br />
0, & \text{if } x < 0 \\<br />
\int_{0}^{1}2xdx = x^{2}, & \text{if } 0\leq x \leq 1 \\<br />
1, & \text{if } x > 1 <br />
\end{cases}</math><br />
<br />
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math><br />
<br />
'''Example''': Generating a Bernoulli random variable <br><br />
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math><br />
:<math><br />
F(x) = \begin{cases}<br />
1-p, & \text{if } x < 1 \\<br />
1, & \text{if } x \ge 1<br />
\end{cases}</math><br />
1. Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
2. <math><br />
X = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
0, & \text{if } U > p<br />
\end{cases}</math><br />
<br />
<br />
'''Example''': Generating a Poisson random variable <br><br />
<br />
Let X ~ Poi(u). Write an algorithm to generate X.<br />
The PDF of a poisson is:<br />
:<math>\begin{align} f(x) = e^{-u}*u^x/x! \end{align}</math><br />
We know that<br />
:<math>\begin{align} P_{x+1} = e^{-u}*u^{x+1}/{x+1}! \end{align}</math><br />
The ratio is <math>\begin{align} P_{x+1}/P_x = ... = u/{x+1} \end{align}</math><br />
Therefore, <math>\begin{align} P_{x+1} = P_x * {u/{x+1}} \end{align}</math><br />
<br />
Algorithm: <br><br />
1) Generate U ~ U(0,1) <br><br />
2) <math>\begin{align} X = 0 \end{align}</math><br />
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math><br />
3) If U<F, output x <br><br />
Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br><br />
<math>\begin{align} F = F + p \end{align}</math> <br><br />
<math>\begin{align} x = x + 1 \end{align}</math> <br><br />
4) Go to x <br><br />
<br />
Acknowledgements: This is from Stat 340 Winter 2013<br />
<br />
<br />
'''Example''': Generating Geometric Distribution:<br />
<br />
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of failure before the first success. x=1,2,3..... We have pmf:<br />
<math>P(X=x_i) = p*(1-p)^{x_{i-1}}</math><br />
We have CDF:<br />
<math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success.<br />
Now consider the inverse transform:<br />
:<math><br />
x = \begin{cases}<br />
1, & \text{if } U\leq p \\<br />
2, & \text{if } p < U \leq 1-(1-p)^2 \\<br />
3, & \text{if } 1-(1-p)^2 <U\leq 1-(1-p)^3 \\<br />
....<br />
k, & \text{if } 1-(1-p)^{k-1} <U\leq 1-(1-p)^k <br />
....<br />
\end{cases}</math><br />
<br />
<br />
'''Note''': Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach) <br><br />
<br />
<br />
<br />
'''General Procedure'''<br /><br />
1. Draw U ~ U(0,1)<br /><br />
2. If <math>U \leq P_{0}</math> deliver <math>x = x_{0}</math><br /><br />
3. Else if <math>U \leq P_{0} + P_{1}</math> deliver <math>x = x_{1}</math><br /><br />
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /><br />
... <br />
Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /><br />
<br />
'''Problems'''<br /><br />
1. We have to find <math> F^{-1} </math><br />
<br />
2. For many distributions, such as Gaussian, it is too difficult to find the inverse of <math> F(x) ,</math><br />
flipping a coin is a discrete case of uniform distribution, and for the code it is randomly flipped 1000 times for the coin, and the result we can see is closed to the express value(0.5)<br />
and example 2 is another discrete distribution, it shows that we can discrete uniform for 3 part like ,0,1,2, and the probability of each part or each trial is the same.<br />
Example 3 is use inverse method to figure out the probability range of each random varibles.<br />
<br />
===Acceptance-Rejection Method===<br />
<br />
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;<br />
# Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)<br />
# For some distributions, such as Gaussian, it is too difficult to find the inverse<br />
<br />
To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''.<br />
<br />
Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''.<br />
<br />
[[File:AR_Method.png]]<br />
<br />
<br />
{{Cleanup|reason= Do not write <math>c*g(x)</math>. Instead write <math>c \times g(x)</math> or <math>\,c g(x)</math><br />
}}<br />
<br />
The main logic behind the Acceptance-Rejection Method is that:<br><br />
1. We want to generate sample points from an unknown distribution, say f(x).<br><br />
2. We use cg(x) to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br><br />
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br><br />
<br />
Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x iff g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> &forall;x. <br><br />
<br />
Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br><br />
<br />
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br><br />
Either use a software package to test if <math>f(x)\leqslant c g(x)</math> for an arbitrarily chosen c > 0, or:<br><br />
1. Find first and second derivatives of f(x) and g(x).<br><br />
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br><br />
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br><br />
4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br />
<br />
c should be close to the maximum of f(x)/g(x), otherwise there is a high chance we will end up rejecting our sample.<br />
<br />
<ul>value around x<sub>1</sub> will be sampled more often under cg(x) than under f(x).<=More samples than we actually need, if <math>\frac{f(y)}{\, c g(y)}</math> is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x<sub>1</sub>, we should accept less and reject more.</ul><br />
<ul>around x<sub>2</sub>:number of sample that are drawn and the number we need are much closer. So in the region above x<sub>2</sub>, we accept more. As a result, g(x) and f(x) are comparable.</ul><br />
<br />
Another way to understand why the the acceptance probability is <math>\frac{f(y)}{\, c g(y)}</math>, is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, <math>\frac{f(y)}{\, c g(y)}</math> is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then <math>\frac{f(y)}{\, c g(y)}</math> because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y). <br />
<br />
'''Procedure'''<br />
<br />
#Draw Y~g(.)<br />
#Draw U~u(0,1) (Note: U and Y are independent)<br />
#If <math>u\leq \frac{f(y)}{cg(y)}</math> (which is <math>P(accepted|y)</math>) then x=y, else return to Step 1<br><br />
<br />
<br />
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br><br />
ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br><br />
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution.<br />
<br />
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to <math>c\leq \frac{f(y)}{g(y)}</math><br />
<br />
<br />
introduce the relationship of c*g(x)and f(x),and proof why they have that relationship and where we can use this rule to reject some case.<br />
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.<br />
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance<br />
<br />
=== Proof ===<br />
<br />
We want to show that P(x)(which is original distribution) can be obtained/sampled using a known distribution g(y).<br />
Therefore, mathematically we want to show that:<br /><br />
<math>P(x) = P(y|accepted) = f(y)</math> <br/><br /><br />
<br />
<math>P(y|accepted)=f(y)</math><br /><br />
<br />
<math>P(y|accepted)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> <br />
<br />
Recall the conditional probability formulas:<br /><br />
<br />
<math>P(a|b)=\frac{P(a,b)}{P(b)}</math>, or <math>P(a|b)=\frac{P(b|a)P(a)}{P(b)}</math><br /><br />
<br />
<br />based on the concept from '''procedure-step1''':<br /><br />
<math>P(y)=g(y)</math><br /><br />
<br />
<math>P(accepted|y)=\frac{f(y)}{cg(y)}</math> <br /><br />
(the larger the value is, the larger the chance it will be selected) <br /><br /><br />
<br />
<br />
<math><br />
\begin{align}<br />
P(accepted)&=\int_y\ P(accepted|y)P(y)\\<br />
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\<br />
&=\frac{1}{c} \int_y\ f(s) ds\\<br />
&=\frac{1}{c}<br />
\end{align}</math><br /><br />
<br />
Therefore:<br /><br />
<math>\begin{align}<br />
P(x)&=P(y|accepted)\\<br />
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\<br />
&=\frac{\frac{f(y)}{c}}{1/c}\\<br />
&=f(y)\end{align}</math><br /><br /><br /><br />
<br />
'''''Here is an alternative introduction of Acceptance-Rejection Method'''''<br />
<br />
Comments:<br />
<br />
-Acceptance-Rejection Method is not good for all cases. One obvious cons is that it could be very hard to pick the g(y) and the constant c in some cases. And usually, c should be a small number otherwise the amount of work when applying the method could be HUGE.<br />
<br />-Note: When f(y) is very different than g(y), it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for u to be less than this small value. <br />
<br />
Acceptance-Rejection Method<br/><br />
Example 1 (discrete case)<br/><br />
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/><br />
We use a discrete distribution DU[0,2] to approximate this.<br/><br />
f(x)=Pr(X=x)=2Cx*(0.5)^2<br/><br />
<br />
{| class=wikitable align=left<br />
|x||0||1||2 <br />
|-<br />
|f(x)||1/4||1/2||1/4 <br />
|-<br />
|g(x)||1/3||1/3||1/3 <br />
|-<br />
|c=f(x)/g(x)||3/4||3/2||3/4<br />
|-<br />
|f(x)/(c*g(x))||1/2||1||1/2<br />
|}<br />
<br />
<br />
Since we need c>=f(x)/g(x)<br/><br />
We need c=3/2<br/><br />
<br />
Therefore, the algorithm is:<br/><br />
1. Generate u,v~U(0,1)<br/><br />
2. Set y=floor(3*u) (This is using uniform distribution to generate DU[0,2]<br/><br />
3. If (y=0) and (v<1/2), output=0 <br/><br />
If (y=2) and (v<1/2), output=2 <br/><br />
Else if y=1, output=1<br/><br />
<br />
<br />
An elaboration of “c”<br/><br />
c is the expected number of times the code runs to output 1 random variable. Remember that when u < f(x)/(c*g(x)) is not satisfied, we need to go over the code again.<br/><br />
<br />
Proof<br/><br />
<br />
Let f(x) be the function we wish to generate from, but we cannot use inverse transform method to generate directly.<br/><br />
Let g(x) be the helper function <br/><br />
Let kg(x)>=f(x)<br/><br />
Since we need to generate y from g(x),<br/><br />
Pr(select y)=g(y)<br/><br />
Pr(output y|selected y)=Pr(u<f(y)/(c*g(y)))= (y)/(c*g(y)) (Since u~Unif(0,1))<br/><br />
Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c <br/><br />
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/><br />
Therefore, E(X)=1/(1/c))=c <br/><br />
<br />
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.<br />
<br />
use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one.<br />
the example shows how to choose the c for the two function g(x) and f(x).<br />
<br />
=== Example of Acceptance-Rejection Method===<br />
<br />
Generating a random variable having p.d.f. <br />
<math>f(x) = 20x(1 - x)^3, 0< x <1 </math> <br />
Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br />
g(x) = 1, 0 < x < 1<br />
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br />
<math> f(x)/g(x) = 20x(1 - x)^3 </math><br />
Differentiation of this quantity yields <br />
<math>d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br />
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, <br />
and thus, <br />
<math>f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math> <br />
Hence,<br />
<math>f(x)/c*g(x)=(256/27)*(x*(1-x)^3)</math> <br />
and thus the simulation procedure is as follows:<br />
<br />
1) Generate two random numbers U1 and U2 .<br />
<br />
2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>2</sub>, and stop<br />
Otherwise return to step 1). <br />
The average number of times that step 1) will be performed is c = 135/64.<br />
<br />
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)<br />
<br />
use the derivative to proof the accepetance-rejection method,<br />
find the local maximum of f(x)/g(x).<br />
and we can calculate the best constant c.<br />
<br />
=== Simple Example of Acceptance-Rejection Method===<br />
Consider the random variable X, with distribution <math> X </math> ~ <math> U[0,0.5] </math><br />
<br />
So we let <math> f(x) = 2x </math> on <math> [0, 1/2] </math><br />
<br />
Let <math>g(.)</math> be <math>U[0,1]</math> distributed. So <math>g(x) = x</math> on <math>[0,1]</math><br />
<br />
Then take <math>c = 2</math><br />
<br />
So <math>f(x)/cg(x) = (2x) / {(2)(x) } = 1</math> on the interval <math>[0, 1/2]</math> and<br />
<br />
<math>f(x)/cg(x) = (0) / {(2)(x) } = 0</math> on the interval <math>(1/2, 1]</math><br />
<br />
So we reject:<br />
<br />
None of the numbers generated in the interval <math>[0, 1/2]</math><br />
<br />
All of the numbers generated in the interval <math>(1/2, 1]</math><br />
<br />
And this results in the distribution <math>f(.)</math> which is <math>U[0,1/2]</math><br />
<br />
a example to show why the we reject a case by using acceptance-rejection method.<br />
<br />
===Another Example of Acceptance-Rejection Method===<br />
Generate a random variable from:<br /> <br />
<math>f(x)=3*x^2</math>, 0< x <1<br /><br />
Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /><br />
Therefore:<br /><br />
<math>c = max(f(x)/(g(x)))= 3</math><br /> <br />
<br />
the best constant c is the max(f(x)/(c*g(x))) and the c make the area above the f(x) and below the g(x) to be small.<br />
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br />
<math>f(x)/(c*g(x))= x^2</math><br /><br />
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm<br />
<br />
== Class 4 - Thursday, May 16 == <br />
*When we want to find a target distribution, denoted as <math>f(x)</math>; we need to first find a proposal distribution <math>g(x)</math> which is easy to sample from. <br><br />
*The relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>. <br><br />
*Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, <math> c </math> keeps <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>), and we must to choose the constant <math> C </math> to achieve this.<br /><br />
*In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible.<br /><br />
<br />
'''How to find C''':<br /><br />
<math>\begin{align}<br />
&c \cdot g(x) \geq f(x)\\<br />
&c\geq \frac{f(x)}{g(x)} \\<br />
&c= \max \left(\frac{f(x)}{g(x)}\right) <br />
\end{align}</math><br><br />
*The logic behind this:<br />
The Acceptance-Rejection method involves finding a distribution that we know how to sample from (g(x)) and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want cg(x)>=f(x).<br />
And it means c has to be greater or equal to f(x)/g(x). So the smallest possible c that satisfies the condition is the maximum value of f(x)/g(x) <br />. If c is made to be too large, the chance of acceptance of generated values will be small, and the algorithm will lose its purpose.<br />
<br />
*For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br><br />
*It is easy to show that the expected number of trials for an acceptance is c. Thus, the smaller c is, the lower the rejection rate, and the better the algorithm:<br><br />
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br><br />
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math><br />
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br><br />
*So far, the only distribution we know how to sample from is the UNIFORM distribution. <br><br />
<br />
'''Procedure''': <br><br />
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br><br />
The easiest case is UNIF(0,1). However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the UNIF(0,1) variable. <br><br />
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math><br />
<br />
'''Recall the general procedure of Acceptance-Rejection Method'''<br />
#Let <math>Y \sim~ g(y)</math> <br />
#Let <math>U \sim~ Unif [0,1] </math><br />
#If <math>U \leq \frac{f(x)}{c \cdot g(x)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)<br />
<br />
<hr><b>Example: Generate a random variable from the pdf</b><br><br />
<math> f(x) = <br />
\begin{cases} <br />
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\<br />
0, & \mbox{otherwise}<br />
\end{cases} </math><br />
<br />
We can note that this is a special case of Beta(2,1), where, <br />
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br><br />
<br />
Where &Gamma; (n)=(n-1)! if n is positive integer<br />
<br />
<math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math><br />
<br />
Aside: Beta function<br />
<br />
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by<br />
<math>B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt</math><br><br />
<br />
<br />
<math>beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x</math><br><br />
<br />
<hr><br />
<math>g=u(0,1)</math><br><br />
<math>y=g</math><br><br />
<math>f(x)<=(c(g(x))</math><br><br />
<math>c>=\frac{f(x)}{g(x)}</math><br><br />
<math>c = max \frac{f(x)}{g(x)} </math><br><br />
<br><math>c = max (2x/1), 0 \leq x \leq 1</math><br><br />
Taking x = 1 gives the highest possible c, which is c=2<br />
<br />Note that c is a scalar greater than 1.<br />
<br />
[[File:Beta(2,1)_example.jpg|750x750px]]<br />
<br />
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that c*g can cover entire f(x) area. In this case, c=2, so that makes g runs from 0 to 2 on y-axis which covers f(x).<br />
<br />
Comment:<br />
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1).<br />
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately n*c points from the proposal distribution (g(x)) in total. <br><br />
<b>Step</b><br />
<ol><br />
<li>Draw y~u(0,1)</li><br />
<li>Draw u~u(0,1)</li><br />
<li>if <math>u \leq \frac{(2*y)}{(2*1)}, x=y</math><br><br />
4.else go to 1</li><br />
</ol><br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>close all<br />
>>clear all<br />
>>ii=1; # ii:numbers that are accepted<br />
>>jj=1; # jj:numbers that are generated<br />
>>while ii<1000<br />
y=rand;<br />
u=rand;<br />
jj=jj+1;<br />
if u<=y<br />
x(ii)=y;<br />
ii=ii+1;<br />
end<br />
end<br />
>>hist(x)<br />
>>jj<br />
jj = 2024 # should be around 2000<br />
</pre><br />
[[File:ARM_Example.jpg|300px]]<br />
<br />
:'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.<br />
<br />
:'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.<br />
<br />
:'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples.<br />
<br />
<br />
'''<br />
'''Example for A-R method:''''''<br />
<br />
Given <math> f(x)= 3/4 (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number<br />
<br />
<br />
[[Solution:]]<br />
<br />
Let g=U(0,1) and g(x)=1<br />
<br />
let y ~ f, <br />
<math> cg(x)\geq f(x),<br />
c \geq (3/4)(1-x^2) /1, <br />
c=max (3/4)(1-x^2) = 3/4 </math><br />
<br />
The process:<br />
<br />
:1: Draw y ~ U(0,1) <br><br />
:2: Draw U~U(0,1) <br><br />
:3: if <math>U \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{4}} = 1-y^2</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/4) is getting from f(y) / (cg(y)) )<br />
:4: else: return to '''step 1''' <br />
<br />
----<br />
'''Use Inverse Method for this Example'''<br><br />
:<math>F(x)=\int_0^x \! 2s\,ds={x^2} -0={x^2}</math><br><br />
:<math>y=x^2</math><br><br />
:<math>x=\sqrt y</math><br />
:<math> F^{-1}\left (\, x \, \right) =\sqrt x</math><br />
<br />
:*Procedure<br />
:1: Draw <math> U~ \sim~ Unif [0,1] </math><br><br />
:2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math><br />
<br />
<span style="font-weight:bold;color:green;">Matlab Code</span><br />
<pre><br />
>>u=rand(1,1000);<br />
>>x=u.^0.5;<br />
>>hist(x)<br />
</pre><br />
[[File:ARM(IFM)_Example.jpg|300px]]<br />
<br />
<span style="font-weight:bold;colour:green;">Matlab Tip:</span><br />
Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the Square root of the entire matrix U the period, "*.*" would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product; however, if we don't use "." it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since matrix dimensions must agree.<br />
<br />
=====Example of Acceptance-Rejection Method=====<br />
<br />
<math>f(x)=3x^2, 0<x<1; </math><br />
<math>g(x)=1, 0<x<1</math><br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac{3x^2}{1} = 3 </math><br><br />
<math>\frac{f(x)}{c \cdot g(x)} = x^2</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2 \leqslant (u_1)^2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
We can also use g(x)=2x for a more efficient algorithm<br />
<br />
<math>c = max \frac{f(x)}{g(x)} = max \frac {3x^2}{2x} = \frac {3x}{2} </math><br />
Use the inverse method to sample from g(x)<br />
<math>G(x)=x^2</math><br />
Generate U from U(0,1) and set <math>x=sqrt(u)</math><br />
<br />
1. Generate two uniform numbers u<sub>1</sub> and u<sub>2</sub><br />
2. If <math>u_2<=3sqrt(u_1)/2</math>, accept u<sub>1</sub> as the random variable from f, if not return to Step 1<br />
<br />
'''Possible Limitations'''<br />
<br />
This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before<br> <br />
we get the 1000 accepted points. For example in the example we did in class relating the f(x)=2x, <br><br />
we had to sample around 2070 points before we finally accepted 1000 sample points.<br><br />
<br />
'''Acceptance - Rejection Method Application on Normal Distribution''' <br><br />
<br />
X ∼ N(μ,σ<sup>2</sup>); X = σZ + μ; Z ~ N(0,1) <br><br />
|Z| has probability function of <br><br />
<br />
f(x) = (2/<math>\sqrt{2\pi}</math>) e<sup>-x<sup>2</sup>/2</sup><br />
<br />
g(x) = e<sup>-x</sup><br />
<br />
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum. <br />
<br />
Hence x=1 maximizes h(x) => c = <math>\sqrt{2e/\pi}</math><br />
<br />
Thus f(y)/cg(y) = e<sup>-(y-1)<sup>2</sup>/2</sup><br />
<br />
<br />
learn how to use code to calculate the c between f(x) and g(x).<br />
<br />
===How to transform <math>U(0,1)</math> to <math>U(a, b)</math>===<br />
<br />
1. Draw U from <math>U(0,1)</math><br />
<br />
2. Take <math>Y=(b-a)U+a</math><br />
<br />
3. Now Y follows <math>U(a,b)</math><br />
<br />
'''Example''': Generate a random variable z from the Semicircular density <math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R</math>.<br />
<br />
-> Proposal distribution: UNIF(-R, R)<br />
<br />
-> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1) </math><br />
<br />
Now, we need to find c:<br />
Since c=max[f(x)/g(x)], where <br /><br />
<math>f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}</math>, <math>g(x)=\frac{1}{2R}</math>, <math>-R\leq x\leq R</math><br /><br />
Thus, we have to maximize R^2-x^2.<br />
=> When x=0, it will be maximized.<br />
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4.<br />
<br />
We will accept the points with limit f(x)/[c*g(x)].<br />
Since <math>\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}</math> <br />
<br />
* Note: Y= R(2U-1)<br />
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).<br />
<br />
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points<br />
<br />
<br />
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math><br />
<br />
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math><br />
<br />
3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2} x = y </math> else return to step 1.<br />
<br />
<br />
<br />
The condition is <br /><br />
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br><br />
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br><br />
<Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br><br />
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math><br />
<br />
<br />
<br />
<br />
'''One more example about AR method''' <br/><br />
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)<br />
Let <math>f(x)=x*e^{-x}, x>0 </math> <br/><br />
Use <math>g(x)=a*e^{-a*x}</math>to generate random variable <br/><br />
<br/><br />
Solution: First of all, we need to find c<br/><br />
<math>c*g(x)>=f(x)</math> <br/><br />
<math>c>=\frac{f(x)}{g(x)}</math> <br/><br />
<math>\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}</math> <br/><br />
take derivative with respect to x, and set it to 0 to get the maximum, <br/><br />
<math>\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 </math><br/><br />
<math>x=\frac {1}{1-a}</math> <br/><br />
<br />
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/><br />
<math>\frac {f(0)}{g(0)} = 0</math><br/><br />
<math>\frac {f(infinity)}{g(infinity)} = 0</math><br/><br />
<br/><br />
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/><br />
<br/><br />
<b>In order to minimize c, we need to find the appropriate a</b> <br/><br />
Take derivative with respect to a and set it to be zero, <br/><br />
We could get a=1/2 <br/><br />
<b>c=4/e</b><br />
<br/><br />
Procedure: <br/><br />
1. Generate u v ~unif(0,1) <br/><br />
2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/><br />
3. If v<f(y)/(c*g(y)), output y<br/><br />
Else, go to 1<br/><br />
<br />
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.<br />
<br />
'''Summary of how to find the value of c''' <br/><br />
Let h(x)=f(x)/g(x), and then we have the following:<br /><br />
1. First, take derivative of h(x) with respect to x, get x1;<br /><br />
2. Plug x1 into h(x) and get the value(or a function) of c, denote as c1;<br /><br />
3. Check the endpoints of x and sub the endpoints into h(x);<br /><br />
4. (if c1 is a value, then we can ignore this step) Since we want the smallest value of c such that f(x)<= c*g(x) for all x, we want the unknown parameter that minimizes c. <br />So we take derivative of c1 with respect to the unknown parameter (ie k=unknown parameter) to get the value of k. <br />Then we submit k to get the value of c1. (Double check that c1>=1)<br /><br />
5. Pick the maximum value of h(x) to be the value of c.<br /><br />
<br />
For that two examples, firstly, we need to generate the probability function to uniform distribution.<br />
and figure out c=max(f(y)/g(y))<br />
If v<f(y)/(c*g(y)), output y.</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16301stat340s132013-05-13T21:48:30Z<p>Ysyap: /* Inverse Transform Method */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
Lu Cheng Monday 3:30-5:30 pm M3 3108 space 2 <br /><br />
Han ShengSun Tuesday 4-6 pm M3 3108 space 2 <br /><br />
Yizhou Fang Wednesday 1-3 pm M3 3108 space 1 <br /><br />
Huan Cheng Thursday 3-5 pm M3 3111 space 1 <br /><br />
Wu Lin Friday 11am - 1pm M3 3108 space 1 <br /><br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Prerequisite ===<br />
(One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
=== Antirequisite ===<br />
CM 361/STAT 341, CS 437, 457<br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularit y<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' : [https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]<br />
<br />
- you can submit your contributions in multiple times.<br /><br />
- you will be able to edit the response right after submitting<br /><br />
- send email to make changes to an old response : uwstat340@gmail.com<br /><br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
To pass the course, you need to pass the final exam<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
There are camps debating about sampling. Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Mod ===<br />
Mod m: number b that minus number a can be divided by m.<br />
Which mean: a = b mod m <=> m / ( b – a ). <br />Ex, 1=10 mod 3.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
Note that if you take a number x, then divided by m. The remainder (please see the graph below) is a number between 0 and m-1. <br /><br />
We use operator "mod" to represent the expression, which is x mod m. <br /><br />
[[File:1.png]]<br />
<br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
... <br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(if people choose numbers properly, they could get a random sequence of numbers. However, how do we find the value of a,b, and m? At the very least m should be a very '''large''' number. The larger m is, the more possibility people get a random sequence of numbers. This is easier to solve in Matlab.) <br /><br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function called RAND to generate a number between 0 and 1. <br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called '''seed'''. A sequence of numbers is defined by x<sub>k+1</sub> = a*x<sub>k</sub> + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself so '''it is important to choose m large'''. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters 'a' and 'b' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a non prime number such as 40 for m, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and m-1. If the values are normalized by dividing by '''m-1''', their result is numbers uniformly distributed on the interval [0,1].<br /><br />
<br />
From the example shown above, we can see to create a good random number sequence generator, we need to select a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m after that a value will be repeated; and once this happens the whole sequence will began to repeat. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated after several recursion.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7<sup>5</sup>=16807, b=0, m=2<sup>31</sup> -1=2147483647 – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that m should be large)<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the chance to choose the seed. Sometimes the first number is chosen by CPU.<br /><br />
<br />
Moreover, it is fact that not all variables are uniform. Some has normal distribution, or exponential distribution, or binomial distribution as well. <br /><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to use this method in generating pseudorandom numbers, the probability distribution consumed must be a cdf such that it is continuous for the F<sup>-1</sup> always exists.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let <math> x=F^{-1}(U)</math><br />
Then x has distribution function F(•)<br /><br />
Where <math>F(x) = P(X</math> ≤ <math>x)</math> is the CDF and;<br/> <br />
<math> F^{-1}(U) </math>denotes the inverse function of F(•) <br />
Or that <math>F(x)=U</math> <math>\Rightarrow</math> <math>x=F^{-1}(U)</math><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
<math>F(x) = P(X</math> ≤ <math> x)</math><br /><br />
<math>F(x) = P(F^{-1}(U)</math> ≤ <math> x)</math> <br /><br />
<math>F(x) = P(F(F^{-1}(U))</math> ≤ <math> F(x))</math> ::"Applying F, which is monotonic, to both sides" <br/><br />
<math>F(x) = P(U</math> ≤ <math> F(x))</math><br /> <br />
<math>F(x) = F(x) </math> ::"Because <math>Pr(U </math> ≤ <math>y)=y</math>,since U is uniform on the unit interval"<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>><math>x=\frac{(-log(1-u))}{2}</math>;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16299stat340s132013-05-13T21:47:24Z<p>Ysyap: /* Inverse Transform Method */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
Lu Cheng Monday 3:30-5:30 pm M3 3108 space 2 <br /><br />
Han ShengSun Tuesday 4-6 pm M3 3108 space 2 <br /><br />
Yizhou Fang Wednesday 1-3 pm M3 3108 space 1 <br /><br />
Huan Cheng Thursday 3-5 pm M3 3111 space 1 <br /><br />
Wu Lin Friday 11am - 1pm M3 3108 space 1 <br /><br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Prerequisite ===<br />
(One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
=== Antirequisite ===<br />
CM 361/STAT 341, CS 437, 457<br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularit y<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
'''Wikicoursenote contribution form''' - [[https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform]]<br />
- you can submit your contributions in multiple times.<br />
- you will be able to edit the response right after submitting.<br />
- send email to make changes to an old response : uwstat340@gmail.com<br />
<br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
To pass the course, you need to pass the final exam<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
There are camps debating about sampling. Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Mod ===<br />
Mod m: number b that minus number a can be divided by m.<br />
Which mean: a = b mod m <=> m / ( b – a ). <br />Ex, 1=10 mod 3.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
Note that if you take a number x, then divided by m. The remainder (please see the graph below) is a number between 0 and m-1. <br /><br />
We use operator "mod" to represent the expression, which is x mod m. <br /><br />
[[File:1.png]]<br />
<br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
... <br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(if people choose numbers properly, they could get a random sequence of numbers. However, how do we find the value of a,b, and m? At the very least m should be a very '''large''' number. The larger m is, the more possibility people get a random sequence of numbers. This is easier to solve in Matlab.) <br /><br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function called RAND to generate a number between 0 and 1. <br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called '''seed'''. A sequence of numbers is defined by x<sub>k+1</sub> = a*x<sub>k</sub> + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself so '''it is important to choose m large'''. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters 'a' and 'b' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a non prime number such as 40 for m, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and m-1. If the values are normalized by dividing by '''m-1''', their result is numbers uniformly distributed on the interval [0,1].<br /><br />
<br />
From the example shown above, we can see to create a good random number sequence generator, we need to select a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m after that a value will be repeated; and once this happens the whole sequence will began to repeat. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated after several recursion.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7<sup>5</sup>=16807, b=0, m=2<sup>31</sup> -1=2147483647 – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that m should be large)<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the chance to choose the seed. Sometimes the first number is chosen by CPU.<br /><br />
<br />
Moreover, it is fact that not all variables are uniform. Some has normal distribution, or exponential distribution, or binomial distribution as well. <br /><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution. However, to use this method in generating random numbers, the probability distribution consumed must be a cdf such that it is continuous for the F<sup>-1</sup> always exists.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let <math> x=F^{-1}(U)</math><br />
Then x has distribution function F(•)<br /><br />
Where <math>F(x) = P(X</math> ≤ <math>x)</math> is the CDF and;<br/> <br />
<math> F^{-1}(U) </math>denotes the inverse function of F(•) <br />
Or that <math>F(x)=U</math> <math>\Rightarrow</math> <math>x=F^{-1}(U)</math><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
<math>F(x) = P(X</math> ≤ <math> x)</math><br /><br />
<math>F(x) = P(F^{-1}(U)</math> ≤ <math> x)</math> <br /><br />
<math>F(x) = P(F(F^{-1}(U))</math> ≤ <math> F(x))</math> ::"Applying F, which is monotonic, to both sides" <br/><br />
<math>F(x) = P(U</math> ≤ <math> F(x))</math><br /> <br />
<math>F(x) = F(x) </math> ::"Because <math>Pr(U </math> ≤ <math>y)=y</math>,since U is uniform on the unit interval"<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>><math>x=\frac{(-log(1-u))}{2}</math>;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16295stat340s132013-05-13T21:35:20Z<p>Ysyap: /* Sampling (Generating random numbers), Class 2 - Thursday, May 9 */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
Lu Cheng Monday 3:30-5:30 pm M3 3108 space 2 <br /><br />
Han ShengSun Tuesday 4-6 pm M3 3108 space 2 <br /><br />
Yizhou Fang Wednesday 1-3 pm M3 3108 space 1 <br /><br />
Huan Cheng Thursday 3-5 pm M3 3111 space 1 <br /><br />
Wu Lin Friday 11am - 1pm M3 3108 space 1 <br /><br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Prerequisite ===<br />
(One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)<br />
<br />
=== Antirequisite ===<br />
CM 361/STAT 341, CS 437, 457<br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularit y<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
=== Tentative Topics ===<br />
- Random variable and stochastic process generation<br /><br />
- Discrete-Event Systems<br /><br />
- Variance reduction<br /><br />
- Markov Chain Monte carlo<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
To pass the course, you need to pass the final exam<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
There are camps debating about sampling. Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is a simple algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
Note that if you take a number x, then divided by m. The remainder (please see the graph below) is a number between 0 and m-1. <br /><br />
We use operator "mod" to represent the expression, which is x mod m. <br /><br />
[[File:1.png]]<br />
<br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
... <br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
(if people choose numbers properly, they could get a random sequence of numbers. However, how do we find the value of a,b, and m? At the very least m should be a very '''large''' number. The larger m is, the more possibility people get a random sequence of numbers. This is easier to solve in Matlab.) <br /><br />
<br />
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: <br /><br />
1. Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer. <br /><br />
2. There is a function called RAND to generate a number between 0 and 1. <br />
3. If we would like to generate 1000 and more numbers, we could use a '''for''' loop)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called '''seed'''. A sequence of numbers is defined by x<sub>k+1</sub> = a*x<sub>k</sub> + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself so '''it is important to choose m large'''. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters 'a' and 'b' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a non prime number such as 40 for m, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and m-1. If the values are normalized by dividing by '''m-1''', their result is numbers uniformly distributed on the interval [0,1].<br /><br />
<br />
From the example shown above, we can see to create a good random number sequence generator, we need to select a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m after that a value will be repeated; and once this happens the whole sequence will began to repeat. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated after several recursion.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7<sup>5</sup>=16807, b=0, m=2<sup>31</sup> -1=2147483647 – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that m should be large)<br /><br />
<br />
There has been a research about how to choose uniform sequence. Many programs give you the chance to choose the seed. Sometimes the first number is chosen by CPU.<br /><br />
<br />
Moreover, it is fact that not all variables are uniform. Some has normal distribution, or exponential distribution, or binomial distribution as well. <br /><br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let <math> x=F^{-1}(U)</math><br />
Then x has distribution function F(•)<br /><br />
Where <math>F(x) = P(X</math> ≤ <math>x)</math> is the CDF and;<br/> <br />
<math> F^{-1}(U) </math>denotes the inverse function of F(•) <br />
Or that <math>F(x)=U</math> <math>\Rightarrow</math> <math>x=F^{-1}(U)</math><br /><br />
*Since the F is continuous, we know the F<sup>-1</sup> exists.<br />
<br />
'''Proof of the theorem:'''<br /><br />
<math>F(x) = P(X</math> ≤ <math> x)</math><br /><br />
<math>F(x) = P(F^{-1}(U)</math> ≤ <math> x)</math> <br /><br />
<math>F(x) = P(F(F^{-1}(U))</math> ≤ <math> F(x))</math> ::"Applying F, which is monotonic, to both sides" <br/><br />
<math>F(x) = P(U</math> ≤ <math> F(x))</math><br /> <br />
<math>F(x) = F(x) </math> ::"Because <math>Pr(U </math> ≤ <math>y)=y</math>,since U is uniform on the unit interval"<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>><math>x=\frac{(-log(1-u))}{2}</math>;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16281stat340s132013-05-13T20:27:55Z<p>Ysyap: /* Multiplicative Congruential Algorithm */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
=== Course Instructor: Ali Ghodsi ===<br />
<!-- br tag for spacing--><br />
Lecture: <br /><br />
001: TTh 8:30-9:50 MC1085 <br /><br />
002: TTh 1:00-2:20 DC1351 <br /><br />
Tutorial: <br /><br />
2:30-3:20 Mon M3 1006 <br /><br />
<br />
=== TA(s): ===<br />
<!-- br tag for spacing--><br />
Lu Cheng Monday 3:30-5:30 pm M3 3108 space 2 <br /><br />
Han ShengSun Tuesday 4-6 pm M3 3108 space 2 <br /><br />
Yizhou Fang Wednesday 1-3 pm M3 3108 space 1 <br /><br />
Huan Cheng Thursday 3-5 pm M3 3111 space 1 <br /><br />
Wu Lin Friday 11am - 1pm M3 3108 space 1 <br /><br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /><br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.) <br /><br />
2. Regression: Same as classification but in the continuous case except y is discrete <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown) <br /><br />
4. Dimensionality Reduction (aka Feature extraction, Manifold learning)<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularit y<br /><br />
Examples: <br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning <br /><br />
*Search and recommendation (eg. Google, Amazon) <br /><br />
*Automatic speech recognition, speaker verification <br /><br />
*Text parsing <br /><br />
*Face identification <br /><br />
*Tracking objects in video<br /><br />
*Financial prediction(e.g. credit cards) <br /><br />
*Fraud detection <br /><br />
*Medical diagnosis <br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook<br />
*Recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*First midterm will be held on Monday, June 17 from 2:30 to 3:30<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note (10% of final mark):'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard.<br />
<br />
<s>Must do both</s> ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
To pass the course, you need to pass the final exam<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is an algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
'''MatLab for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer.)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called seed. A sequence of numbers is defined by x(k+1) = a*x(k) + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There are only a finite number of values (30 in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters 'a' and 'b' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a non prime number such as 40 for m, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and m-1. If the values are normalized by dividing by '''m-1''', their result is numbers uniformly distributed on the interval [0,1].<br /><br />
<br />
From the example shown above, we can see to create a good random number sequence generator, we need to select a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m after that a value will be repeated; and once this happens the whole sequence will began to repeat. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated after several recursion.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7<sup>5</sup>=16807, b=0, m=2<sup>31</sup> -1=2147483647 – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that m should be large)<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let x=F<sup>-1</sup>(U)<br /><br />
Then x has distribution function F(<sup>.</sup>)<br /><br />
Where F(x) = P(X<=x) cdf; F<sup>-1</sup>(U) denotes the inverse function of F(<sup>.</sup>) Or that F(x)=U -> x=F<sup>-1</sup>(U)<br /><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
<math>F(x) = P(X\leq x)</math><br /><br />
<math> =P(F^{-1}(U)\leq x)</math> <br /><br />
<math>=P(F(F^{-1}(U))\leq F(x))</math> #Applying F, which is monotonic, to both sides<br /><br />
<math>=P(U\leq F(x))</math><br /> <br />
<math>=F(x)</math> #Because <math>Pr(U\leq y)=y</math>,since U is uniform on the unit interval<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: <math> x=\frac{-ln(1-U)}{\lambda} </math> <br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>><math>x=\frac{(-log(1-u))}{2}</math>;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16269stat340s132013-05-13T17:53:33Z<p>Ysyap: /* Course Information */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case)<br /><br />
(For example, an image of a fruit can be classified, through some sort of algorithm to be a picture of either an apple or an orange.)<br /><br />
2. Regression: Same as classification but in the continuous case <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, y is unknown)<br /><br />
4. Dimensionality Reduction<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples:<br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning<br /><br />
*Search and recommendation (eg. Google)<br /><br />
*Automatic speech recognition, speaker verification<br /><br />
*Text parsing<br /><br />
*Face identification<br /><br />
*Tracking objects in video<br /><br />
*Financial prediction, fraud detection<br /><br />
*Medical diagnosis<br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook, recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
'''Wikicourse note:'''<br />
When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will use to identify the students who make the contributions.<br />
Example:<br/><br />
User: questid<br/><br />
Email: questid@uwaterloo.ca<br/><br />
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account. <br />
<br />
<s>'''Primary contributor''': Put a summary of the lecture up within 48 hours.</s><br />
<br />
'''General contributor''': Elaborate on concepts, add example, add code, add pictures, reference, corrections etc… <s>withing 2 weeks</s> within 1 week.<br />
<br />
<s>Must do both</s> ''All primary contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
In reality, it is often complicated to identify the distribution.<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is an algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
'''MatLab for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer.)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called seed. A sequence of numbers is defined by x(k+1) = a*x(k) + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There are only a finite number of values (30 in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 1117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters 'a' and 'b' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a non prime number such as 40 for m, our results were not satisfactory in producing an output resembling a uniform distribution.<br /><br />
<br />
The computed values are between 0 and m-1. If the values are normalized by dividing by '''m-1''', their result is numbers uniformly distributed on the interval [0,1].<br /><br />
<br />
From the example shown above, we can see to create a good random number sequence, we need to a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7<sup>5</sup>=16807, b=0, m=2<sup>31</sup> -1=2147483647 – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that m should be large)<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let x=F<sup>-1</sup>(U)<br /><br />
Then x has distribution function F(<sup>.</sup>)<br /><br />
Where F(x) = P(X<=x) cdf; F<sup>-1</sup>(U) denotes the inverse function of F(<sup>.</sup>) Or that F(x)=U -> x=F<sup>-1</sup>(U)<br /><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
<math>F(x) = P(X<= x)</math><br /><br />
<math> =P(F^{-1}(U)<=x)</math> <br /><br />
<math>=P(F(F^{-1}(U))<=F(x))</math> #Applying F, which is monotonic, to both sides<br /><br />
<math>=P(U<=F(x))</math><br /> <br />
<math>=F(x)</math> #Because Pr(U<=y)= y,since U is uniform on the unit interval<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<math> y=1-e^{- \lambda x} </math><br /><br />
<math> 1-y=e^{- \lambda x}</math><br /><br />
<math>x=-ln(1-y)/{\lambda}</math><br /><br />
<math>y=-ln(1-x)/{\lambda}</math><br /><br />
<math>F^{-1}(x)=-ln(1-x)/{\lambda}</math><br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: x=-ln(1-U)/ λ;<br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool #shows different distributions<br />
</pre> <br />
<br />
This command allows users to explore the effect of changes of parameters on the plot of either a CDF or PDF. <br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16178stat340s132013-05-13T07:45:32Z<p>Ysyap: /* Multiplicative Congruential Algorithm */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case)<br /><br />
2. Regression: Same as classification but in the continuous case <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, y is unknown)<br /><br />
4. Dimensionality Reduction<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples:<br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning<br /><br />
*Search and recommendation (eg. Google)<br /><br />
*Automatic speech recognition, speaker verification<br /><br />
*Text parsing<br /><br />
*Face identification<br /><br />
*Tracking objects in video<br /><br />
*Financial prediction, fraud detection<br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook, recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
<s>'''Primary contributor''': Put a summary of the lecture up within 48 hours.</s><br />
<br />
'''General contributor''': Elaborate on concepts, add example, add code, add pictures, reference, corrections etc… <s>withing 2 weeks</s> within 1 week.<br />
<br />
<s>Must do both</s> ''All primary contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
In reality, it is often complicated to identify the distribution.<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is an algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
<br />
<br />
<br />
'''MatLab for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer.)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called seed. A sequence of numbers is defined by x(k+1) = a*x(k) + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There are only a finite number of values (30 in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
'''Examples:[From Textbook]'''<br /><br />
If x<sub>0</sub>=3 and x<sub>n</sub>=(5x<sub>n-1</sub>+7)mod 200, find x<sub>1</sub>,...,x<sub>10</sub>.<br /><br />
'''Solution:'''<br /><br />
x<sub>1</sub>= (15+7) mod 200= 22<br /> x<sub>2</sub>= 117 mod 200= 117 <br /> x<sub>3</sub>= 592 mod 200 = 192<br /><br />
x<sub>4</sub>= 2967 mod 200= 167 <br /> x<sub>5</sub>= 14842 mod 200= 42 <br /> x<sub>6</sub>= 74217 mod 200 = 17<br /><br />
x<sub>7</sub>= 371092 mod 200= 92 <br /> x<sub>8</sub>= 1855467 mod 200= 67 <br /> x<sub>9</sub>= 9277342 mod 200 = 142<br /><br />
x<sub>10</sub>= 46386717 mod 200 = 1117<br /><br />
<br />
'''Comments:'''<br /><br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters helps generate relatively "random" output values, where it is harder to identify patterns. <br /><br />
<br /><br />
From the example shown above, we can see why to create a good random number sequence, we need to a large m. As the x<sub>n</sub> value is dependent on the (5x<sub>n-1</sub>+7)value, such that the value it can be is between 0 to m. Thus, if we want to create large group of random number, it is better to have large m such that the random value generated will not be repeated.<br /><br />
''Example:'' <br /><br />
For x<sub>n</sub> = (2x<sub>n-1</sub>+1) mod 3 where x<sub>0</sub>=2, x<sub>1</sub> = 5 mod 3 = 2<br /><br />
Notice that, with the small value m, the random number generated repeated itself is faster than when the value is large enough.<br /><br />
<br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7^5=16807, b=0, m=2^31 -1=2147483647 – recommended in a 1988 paper by Park and Miller (Important part is that m should be large)<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let x=F<sup>-1</sup>(U)<br /><br />
Then x has distribution function F(<sup>.</sup>)<br /><br />
Where F(x) = P(X<=x) cdf; F<sup>-1</sup>(U) denotes the inverse function of F(<sup>.</sup>) Or that F(x)=U -> x=F<sup>-1</sup>(U)<br /><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
F(x)=P(X<=x)<br /><br />
=P(F<sup>-1</sup>(U)<=x)<br /><br />
=P(F(F<sup>-1</sup>(U))<=F(x)) #Applying F, which is monotonic, to both sides<br /><br />
=P(U<=F(x)) <br /> <br />
=F(x) #Because Pr(U<=y)= y,since U is uniform on the unit interval<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<br />
<br />
y=1-e<sup>-λx</sup>;<br /><br />
1-y=e<sup>-λx</sup>;<br /><br />
x=-ln(1-y)/λ;<br /><br />
y=-ln(1-x)/λ;<br /><br />
F<sup>-1</sup>(x)=-ln(1-x)/λ;<br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: x=-ln(1-U)/ λ;<br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool<br />
</pre><br />
[[File:Disttool.jpg|450px]]</div>Ysyaphttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat340s13&diff=16177stat340s132013-05-13T07:11:08Z<p>Ysyap: /* Introduction */</p>
<hr />
<div>== '''Computer Simulation of Complex Systems (Stat 340 - Spring 2013)''' ==<br />
<br />
<br />
== Introduction, Class 1 - Tuesday, May 7 ==<br />
<br />
{{Cleanup|reason= use math environment and LaTex notations for formulas. For example instead of y=1-e^(-λx) write <math>y=1-e^{-\lambda x}</math><br />
}}<br />
<br />
<br />
=== Four Fundamental Problems ===<br />
<!-- br tag for spacing--><br />
1. Classification: Given an input object X, we have a function which will take in this input X and identify which 'class (Y)' it belongs to (Discrete Case)<br /><br />
2. Regression: Same as classification but in the continuous case <br /><br />
3. Clustering: Use common features of objects in same class or group to form clusters.(in this case, y is unknown)<br /><br />
4. Dimensionality Reduction<br /><br />
<br />
=== Applications ===<br />
<br />
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity<br /><br />
Examples:<br /><br />
*Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning<br /><br />
*Search and recommendation (eg. Google)<br /><br />
*Automatic speech recognition, speaker verification<br /><br />
*Text parsing<br /><br />
*Face identification<br /><br />
*Tracking objects in video<br /><br />
*Financial prediction, fraud detection<br /><br />
<br />
=== Course Information ===<br />
<br />
'''General Information'''<br />
*No required textbook, recommended: "Simulation" by Sheldon M. Ross<br />
*Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)<br />
*Announcements and assignments will be posted on Learn.<br />
*Other course material on: http://wikicoursenote.com/wiki/<br />
*Log on to both Learn and wikicoursenote frequently.<br />
*Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address!<br />
<br />
<s>'''Primary contributor''': Put a summary of the lecture up within 48 hours.</s><br />
<br />
'''General contributor''': Elaborate on concepts, add example, add code, add pictures, reference, corrections etc… <s>withing 2 weeks</s> within 1 week.<br />
<br />
<s>Must do both</s> ''All primary contributions are now considered general contributions you must contribute to 50% of lectures for full marks''<br />
<br />
*A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks<br />
<br />
Do not submit copyrighted work without permission, cite original sources.<br />
Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but didn't, you will ''lose'' marks.<br />
<br />
=== Tentative Marking Scheme ===<br />
{| class="wikitable"<br />
|-<br />
! Item<br />
! Value<br />
|-<br />
| Assignments (~6)<br />
| 30%<br />
|-<br />
| WikiCourseNote<br />
| 10%<br />
|-<br />
| Midterm<br />
| 20%<br />
|-<br />
| Final<br />
| 40%<br />
|}<br />
<br />
=== Comments ===<br />
In reality, it is often complicated to identify the distribution.<br />
<br />
==Sampling (Generating random numbers), Class 2 - Thursday, May 9==<br />
<br />
<br />
=== Introduction === <br />
Some people believe that activities such as rolling a dice and flipping a coin are not truly random but are '''deterministic''' – the result can be reliably calculated using things such as physics and math.<br />
<br />
A computer cannot generate truly random numbers since computers can only run algorithms, which are deterministic in nature. They can, however, generate '''Pseudo Random Numbers'''; numbers that seem random but are actually deterministic. Although the Pseudo Random Numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables.<br />
<br />
=== Multiplicative Congruential Algorithm ===<br />
This is an algorithm used to generate uniform, pseudo random numbers. It is also referred to as the Linear or Mixed Congruential Methods. We define the Linear Congruential Method to be x<sub>k+1</sub>=(a*x<sub>k</sub> + b) mod m. Given a "seed" x<sub>0</sub>, we can obtain values for x<sub>1</sub>, x<sub>2</sub>, ..., x<sub>n</sub> recursively. The Multiplicative Congruential Method may also refer to the special case where b=0.<br /><br />
<br />
'''Algorithm 1'''<br /><br />
x<sub>k+1</sub> = x<sub>k</sub> mod m<br /><br />
<br />
'''Example'''<br /><br />
Let x<sub>0</sub> = 10, m = 3<br /><br />
Step 1: 1 = 10 mod 3<br /><br />
Step 2: 1 = 1 mod 3<br /><br />
Step 3: 1 = 1 mod 3<br /><br />
This method generates a sequence of identical integers, hence we need a better algorithm.<br /><br />
<br />
<br />
'''Algorithm 2 (Multiplicative Congruential Algorithm)'''<br /><br />
x<sub>k+1</sub> = (a*x<sub>k</sub>+b) mod m<br /><br />
<br />
'''Example'''<br /><br />
Let a = 2, b = 1, m = 3, x<sub>0</sub> = 10<br /><br />
Step 1: 0 = (2*10+1) mod 3<br /><br />
Step 2: 1 = (2*0+1) mod 3<br /><br />
Step 3: 0 = (2*1+1) mod 3<br /><br />
This method generates a sequence with a repeating cycle of two integers.<br /><br />
<br />
<br />
<br />
<br />
'''MatLab for Multiplicative Congruential Algorithm:'''<br /><br />
Before you start:<br /><br />
<pre><br />
>>clear all<br />
>>close all<br />
</pre><br />
<pre><br />
>>a=17<br />
>>b=3<br />
>>m=31<br />
>>x=5<br />
>>mod(a*x+b,m)<br />
ans=26<br />
>>x=mod(a*x+b,m)<br />
</pre><br />
<br />
''(Note: Keep repeating this command over and over again and you will seem to get random numbers – this is how the command rand works in a computer.)<br /><br /><br />
<br />
<pre><br />
>>a=13<br />
>>b=0<br />
>>m=31<br />
>>x(1)=1<br />
>>for ii=2:1000<br />
x(ii)=mod(a*x(ii-1)+b,m);<br />
end<br />
>>size(x)<br />
ans=1 1000<br />
>>hist(x)<br />
</pre><br />
[[File:MCA_Example.jpg|300px]]<br />
<br />
''(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not show the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.) ''<br />
<br />
<br />
This algorithm involves three integer parameters a, b, and m and an initial value, x<sub>0</sub> called seed. A sequence of numbers is defined by x(k+1) = a*x(k) + b mod m. Mod m means take the remainder after division by m. <br /><br />
<br />
Note: For some bad a and b, the histogram may not be uniformly distributed.<br /><br />
<br />
<br />
'''Example''': a=13, b=0, m=31<br /><br />
The first 30 numbers in the sequence are a permutation of integers for 1 to 30 and then the sequence repeats itself. Values are between 0 and m-1. If the values are normalized by dividing by m-1, then the result is numbers uniformly distributed in the interval [0,1]. There are only a finite number of values (30 in this case). In Matlab, you can use function "hist(x)" to see if it is uniformly distributed. <br /><br />
<br />
Typically, it is good to choose m such that m is large, and m is prime. Careful selection of parameters helps generate relatively "random" output values, where it is harder to identify patterns. <br /><br />
<br />
For many years the “rand” function in Matlab used this algorithm with these parameters<br />
A=7^5=16807, b=0, m=2^31 -1=2147483647 – recommended in a 1988 paper by Park and Miller (Important part is that m should be large)<br />
<br />
=== Inverse Transform Method ===<br />
This method is useful for generating types of distribution other than uniform distribution, such as exponential distribution and normal distribution.<br />
Exponential distribution has the property that generated numbers are frequently close to 0. Normal distribution has the property that generated numbers are frequently close to its mean.<br />
<br />
'''Theorem''': <br /><br />
Take U ~ U[0,1] and let x=F<sup>-1</sup>(U)<br /><br />
Then x has distribution function F(<sup>.</sup>)<br /><br />
Where F(x) = P(X<=x) cdf; F<sup>-1</sup>(U) denotes the inverse function of F(<sup>.</sup>) Or that F(x)=U -> x=F<sup>-1</sup>(U)<br /><br /><br />
<br />
'''Proof of the theorem:'''<br /><br />
F(x)=P(X<=x)<br /><br />
=P(F<sup>-1</sup>(U)<=x)<br /><br />
=P(F(F<sup>-1</sup>(U))<=F(x)) #Applying F, which is monotonic, to both sides<br /><br />
=P(U<=F(x)) <br /> <br />
=F(x) #Because Pr(U<=y)= y,since U is uniform on the unit interval<br /><br />
<br />
F(<sup>.</sup>) is a monotonic function, which is a function that strictly increasing or decreasing.<br /><br />
<br />
'''Example''': <math> f(x) = \lambda e^{-\lambda x}</math><br/><br />
<!-- Cannot write integrals without imaging --><br />
<math> F(x)= \int_0^x f(x) dx</math><br/><br />
<math> = \int_0^x \lambda e ^{-\lambda x}\ dx</math><br /><br />
<math> = \frac{\lambda}{-\lambda}\, e^{-\lambda x}\, | \underset{0}{x} </math><br /><br />
<math> = -e^{\lambda x} + e^0 </math> <br><br />
<math> =1 - e^{- \lambda x} </math><br /><br />
<br />
<br />
y=1-e<sup>-λx</sup>;<br /><br />
1-y=e<sup>-λx</sup>;<br /><br />
x=-ln(1-y)/λ;<br /><br />
y=-ln(1-x)/λ;<br /><br />
F<sup>-1</sup>(x)=-ln(1-x)/λ;<br /><br />
<br />
<!-- What are these for? --><br />
Step 1: Draw U ~U[0,1];<br /><br />
Step 2: x=-ln(1-U)/ λ;<br /><br />
<br />
'''Example 2''': Given a CDF of X: F(x) = x<sup>5</sup>, transform U~U[0,1].<br />
Sol: <br />
Let y=x<sup>5</sup>, solve for x => x=y<sup>(1/5)</sup> =>F<sup>-1</sup>(x) = x<sup>(1/5)</sup><br />
Hence, to obtain a value of x from F(x), we first set u as an uniform distribution, then obtain the inverse function of F(x), and set<br />
x= u<sup>(1/5)</sup><br />
<br />
<br />
In Matlab, you can use functions:<br />
"who" to see what variables you have defined<br />
"clear all" to clear all variables you have defined<br />
"close all" to close all figures<br />
<br />
'''MatLab for Inverse Transform Method''':<br /><br />
<br />
<pre><br />
>>u=rand(1,1000)<br />
>>hist(u) #will generate a fairly uniform diagram<br />
</pre><br />
[[File:ITM_example_hist(u).jpg|300px]]<br />
<pre><br />
#let λ=2 in this example; however, you can make another value for λ<br />
>>x=(-log(1-u))/2;<br />
>>size(x) #1000 in size <br />
>>figure<br />
>>hist(x) #exponential <br />
</pre><br />
[[File:ITM_example_hist(x).jpg|300px]]<br />
<br />
<!-- Did class end before this was finished? --><br />
<br />
'''Limitations:'''<br /><br />
1. This method is flawed since not all functions are invertible nor monotonic. <br /><br />
2. It may be impractical since some CDF's and/or integrals are not easy to compute.<br /><br />
<br />
=== Probability Distribution Function Tool in MATLAB ===<br />
<pre><br />
>>disttool<br />
</pre><br />
[[File:Disttool.jpg|450px]]</div>Ysyap