Difference between revisions of "stat340s13"
(→The Geometric Distribution) |
m (Conversion script moved page Stat340s13 to stat340s13: Converting page titles to lowercase) |
||
Line 1: | Line 1: | ||
+ | <div style = "align:left; background:#00ffff; font-size: 150%"> | ||
+ | If you | ||
+ | use ideas, plots, text, code and other intellectual property developed by someone else | ||
+ | in your `wikicoursenote' contribution , you have to cite the | ||
+ | original source. If you copy a sentence or a paragraph from work done by someone | ||
+ | else, in addition to citing the original source you have to use quotation marks to | ||
+ | identify the scope of the copied material. Evidence of copying or plagiarism will | ||
+ | cause a failing mark in the course. | ||
+ | |||
+ | Example of citing the original source | ||
+ | |||
+ | Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref> | ||
+ | |||
+ | </div> | ||
+ | |||
+ | ==Important Notes== | ||
+ | <span style="color:#ff0000;font-size: 200%"> To make distinction between the material covered in class and additional material that you have add to the course, use the following convention. For anything that is not covered in the lecture write:</span> | ||
+ | |||
+ | <div style = "align:left; background:#F5F5DC; font-size: 120%"> | ||
+ | In the news recently was a story that captures some of the ideas behind PCA. Over the past two years, Scott Golder and Michael Macy, researchers from Cornell University, collected 509 million Twitter messages from 2.4 million users in 84 different countries. The data they used were words collected at various times of day and they classified the data into two different categories: positive emotion words and negative emotion words. Then, they were able to study this new data to evaluate subjects' moods at different times of day, while the subjects were in different parts of the world. They found that the subjects generally exhibited positive emotions in the mornings and late evenings, and negative emotions mid-day. They were able to "project their data onto a smaller dimensional space" using PCS. Their paper, "Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures," is available in the journal Science.<ref>http://www.pcworld.com/article/240831/twitter_analysis_reveals_global_human_moodiness.html</ref>. | ||
+ | |||
+ | Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref> | ||
+ | |||
+ | </div> | ||
+ | |||
== Introduction, Class 1 - Tuesday, May 7 == | == Introduction, Class 1 - Tuesday, May 7 == | ||
Line 6: | Line 31: | ||
<!-- br tag for spacing--> | <!-- br tag for spacing--> | ||
Lecture: <br /> | Lecture: <br /> | ||
− | 001: | + | 001: T/Th 8:30-9:50am MC1085 <br /> |
− | 002: | + | 002: T/Th 1:00-2:20pm DC1351 <br /> |
Tutorial: <br /> | Tutorial: <br /> | ||
− | 2:30-3: | + | 2:30-3:20pm Mon M3 1006 <br /> |
+ | Office Hours: <br /> | ||
+ | Friday at 10am, M3 4208 <br /> | ||
=== Midterm === | === Midterm === | ||
− | Monday June 17 2013 from 2: | + | Monday June 17,2013 from 2:30pm-3:20pm |
+ | |||
+ | === Final === | ||
+ | Saturday August 10,2013 from 7:30pm-10:00pm | ||
=== TA(s): === | === TA(s): === | ||
Line 51: | Line 81: | ||
=== Four Fundamental Problems === | === Four Fundamental Problems === | ||
<!-- br tag for spacing--> | <!-- br tag for spacing--> | ||
− | 1 | + | 1 Classification: Given input object X, we have a function which will take this input X and identify which 'class (Y)' it belongs to (Discrete Case) <br /> |
− | i.e taking value from x, we could predict y. | + | <font size="3">i.e taking value from x, we could predict y.</font> |
− | (For example, | + | (For example, if you have 40 images of oranges and 60 images of apples (represented by x), you can estimate a function that takes the images and states what type of fruit it is - note Y is discrete in this case.) <br /> |
− | 2 | + | 2 Regression: Same as classification but in the continuous case except y is non discrete. Results from regression are often used for prediction,forecasting and etc. (Example of stock prices, height, weight, etc.) <br /> |
− | 3 | + | (A simple practice might be investigating the hypothesis that higher levels of education cause higher levels of income.) <br /> |
− | 4 | + | 3 Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown; For example, clustering by provinces to measure average height of Canadian men.) <br /> |
+ | 4 Dimensionality Reduction (also known as Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension <br /> | ||
=== Applications === | === Applications === | ||
Line 86: | Line 117: | ||
*Other course material on: http://wikicoursenote.com/wiki/ | *Other course material on: http://wikicoursenote.com/wiki/ | ||
*Log on to both Learn and wikicoursenote frequently. | *Log on to both Learn and wikicoursenote frequently. | ||
− | *Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to | + | *Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to their personal accounts! |
− | '''Wikicourse note (10% of final mark):''' | + | '''Wikicourse note (complete at least 12 contributions to get 10% of final mark):''' |
− | When applying an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will | + | When applying for an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will be used to identify the students who make the contributions. |
Example:<br/> | Example:<br/> | ||
User: questid<br/> | User: questid<br/> | ||
Line 97: | Line 128: | ||
'''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard. | '''As a technical/editorial contributor''': Make contributions within 1 week and do not copy the notes on the blackboard. | ||
− | + | ''All contributions are now considered general contributions you must contribute to 50% of lectures for full marks'' | |
− | *A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc) but at least half of your contributions should be technical for full marks | + | *A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc.) but at least half of your contributions should be technical for full marks. |
Do not submit copyrighted work without permission, cite original sources. | Do not submit copyrighted work without permission, cite original sources. | ||
− | Each time you make a contribution, check mark the table. Marks are calculated on honour system, although there will be random verifications. If you are caught claiming to contribute but | + | Each time you make a contribution, check mark the table. Marks are calculated on an honour system, although there will be random verifications. If you are caught claiming to contribute but have not, you will not be credited. |
− | '''Wikicoursenote contribution form''' : | + | '''Wikicoursenote contribution form''' : https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform |
− | - you can submit your contributions | + | - you can submit your contributions multiple times.<br /> |
- you will be able to edit the response right after submitting<br /> | - you will be able to edit the response right after submitting<br /> | ||
- send email to make changes to an old response : uwstat340@gmail.com<br /> | - send email to make changes to an old response : uwstat340@gmail.com<br /> | ||
Line 116: | Line 147: | ||
- Markov Chain Monte Carlo | - Markov Chain Monte Carlo | ||
− | === | + | ==Class 2 - Thursday, May 9== |
− | + | ===Generating Random Numbers=== | |
− | + | ==== Introduction ==== | |
− | + | Simulation is the imitation of a process or system over time. Computational power has introduced the possibility of using simulation study to analyze models used to describe a situation. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | In order to perform a simulation study, we should: | ||
+ | <br\> 1 Use a computer to generate (pseudo*) random numbers (rand in MATLAB).<br> | ||
+ | 2 Use these numbers to generate values of random variable from distributions: for example, set a variable in terms of uniform u ~ U(0,1).<br> | ||
+ | 3 Using the concept of discrete events, we show how the random variables can be used to generate the behavior of a stochastic model over time. (Note: A stochastic model is the opposite of deterministic model, where there are several directions the process can evolve to)<br> | ||
+ | 4 After continually generating the behavior of the system, we can obtain estimators and other quantities of interest.<br> | ||
− | + | The building block of a simulation study is the ability to generate a random number. This random number is a value from a random variable distributed uniformly on (0,1). There are many different methods of generating a random number: <br> | |
− | + | <br><font size="3">Physical Method: Roulette wheel, lottery balls, dice rolling, card shuffling etc. <br> | |
+ | <br>Numerically/Arithmetically: Use of a computer to successively generate pseudorandom numbers. The <br />sequence of numbers can appear to be random; however they are deterministically calculated with an <br />equation which defines pseudorandom. <br></font> | ||
− | + | (Source: Ross, Sheldon M., and Sheldon M. Ross. Simulation. San Diego: Academic, 1997. Print.) | |
+ | *We use the prefix pseudo because computer generates random numbers based on algorithms, which suggests that generated numbers are not truly random. Therefore pseudo-random numbers is used. | ||
− | + | In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a '''stochastic''' model which encapsulates randomness and probabilistic events. | |
− | + | [[File:Det_vs_sto.jpg]] | |
+ | <br>A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate Pseudo Random Numbers<br> | ||
− | + | '''Pseudo Random Numbers''' are the numbers that seem random but are actually determined by a relative set of original values. It is a chain of numbers pre-set by a formula or an algorithm, and the value jump from one to the next, making it look like a series of independent random events. The flaw of this method is that, eventually the chain returns to its initial position and pattern starts to repeat, but if we make the number set large enough we can prevent the numbers from repeating too early. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate. | |
− | When people | + | When people repeat the test many times, the results will be the closed express values, which make the trials look deterministic. However, for each trial, the result is random. So, it looks like pseudo random numbers. |
− | So, it looks like pseudo random numbers. | ||
− | === Mod === | + | ==== Mod ==== |
Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, | Let <math>n \in \N</math> and <math>m \in \N^+</math>, then by Division Algorithm, | ||
<math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, | <math>\exists q, \, r \in \N \;\text{with}\; 0\leq r < m, \; \text{s.t.}\; n = mq+r</math>, | ||
where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function | where <math>q</math> is called the quotient and <math>r</math> the remainder. Hence we can define a binary function | ||
− | <math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which | + | <math>\mod : \N \times \N^+ \rightarrow \N </math> given by <math>r:=n \mod m</math> which returns the remainder after division by m. |
<br /> | <br /> | ||
+ | Generally, mod means taking the reminder after division by m. | ||
<br /> | <br /> | ||
− | We say that n is congruent to r mod m if n = mq + r, where m is an integer. | + | We say that n is congruent to r mod m if n = mq + r, where m is an integer. |
+ | Values are between 0 and m-1 <br /> | ||
if y = ax + b, then <math>b:=y \mod a</math>. <br /> | if y = ax + b, then <math>b:=y \mod a</math>. <br /> | ||
− | + | ||
− | + | '''Example 1:'''<br /> | |
+ | |||
+ | <math>30 = 4 \cdot 7 + 2</math><br /> | ||
+ | |||
+ | <math>2 := 30\mod 7</math><br /> | ||
<br /> | <br /> | ||
− | + | <math>25 = 8 \cdot 3 + 1</math><br /> | |
− | |||
− | |||
− | 25 = 8 | ||
− | |||
+ | <math>1: = 25\mod 3</math><br /> | ||
+ | <br /> | ||
+ | <math>-3=5\cdot (-1)+2</math><br /> | ||
− | + | <math>2:=-3\mod 5</math><br /> | |
− | + | <br /> | |
+ | '''Example 2:'''<br /> | ||
− | + | If <math>23 = 3 \cdot 6 + 5</math> <br /> | |
− | |||
− | + | Then equivalently, <math>5 := 23\mod 6</math><br /> | |
+ | <br /> | ||
+ | If <math>31 = 31 \cdot 1</math> <br /> | ||
+ | Then equivalently, <math>0 := 31\mod 31</math><br /> | ||
+ | <br /> | ||
+ | If <math>-37 = 40\cdot (-1)+ 3</math> <br /> | ||
+ | Then equivalently, <math>3 := -37\mod 40</math><br /> | ||
− | ''' | + | '''Example 3:'''<br /> |
− | <math> | + | <math>77 = 3 \cdot 25 + 2</math><br /> |
+ | <math>2 := 77\mod 3</math><br /> | ||
+ | <br /> | ||
+ | <math>25 = 25 \cdot 1 + 0</math><br /> | ||
− | + | <math>0: = 25\mod 25</math><br /> | |
− | <math> | + | <br /> |
− | |||
− | |||
− | |||
− | + | '''Note:''' <math>\mod</math> here is different from the modulo congruence relation in <math>\Z_m</math>, which is an equivalence relation instead of a function. | |
− | |||
− | <math>\ | ||
− | + | The modulo operation is useful for determining if an integer divided by another integer produces a non-zero remainder. But both integers should satisfy <math>n = mq + r</math>, where <math>m</math>, <math>r</math>, <math>q</math>, and <math>n</math> are all integers, and <math>r</math> is smaller than <math>m</math>. The above rules also satisfy when any of <math>m</math>, <math>r</math>, <math>q</math>, and <math>n</math> is negative integer, see the third example. | |
+ | ==== Mixed Congruential Algorithm ==== | ||
+ | We define the Linear Congruential Method to be <math>x_{k+1}=(ax_k + b) \mod m</math>, where <math>x_k, a, b, m \in \N, \;\text{with}\; a, m \neq 0</math>. Given a '''seed''' (i.e. an initial value <math>x_0 \in \N</math>), we can obtain values for <math>x_1, \, x_2, \, \cdots, x_n</math> inductively. The Multiplicative Congruential Method, invented by Berkeley professor D. H. Lehmer, may also refer to the special case where <math>b=0</math> and the Mixed Congruential Method is case where <math>b \neq 0</math> <br />. Their title as "mixed" arises from the fact that it has both a multiplicative and additive term. | ||
− | ''' | + | An interesting fact about '''Linear Congruential Method''' is that it is one of the oldest and best-known pseudo random number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications that require high randomness. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)<br /> |
− | <math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: (a | + | |
+ | [[File:Linear_Congruential_Statment.png|600px]] "Source: STAT 340 Spring 2010 Course Notes" | ||
+ | |||
+ | '''First consider the following algorithm'''<br /> | ||
+ | <math>x_{k+1}=x_{k} \mod m</math> <br /> | ||
+ | |||
+ | such that: if <math>x_{0}=5(mod 150)</math>, <math>x_{n}=3x_{n-1}</math>, find <math>x_{1},x_{8},x_{9}</math>. <br /> | ||
+ | <math>x_{n}=(3^n)*5(mod 150)</math> <br /> | ||
+ | <math>x_{1}=45,x_{8}=105,x_{9}=15</math> <br /> | ||
+ | |||
+ | |||
+ | |||
+ | '''Example'''<br /> | ||
+ | <math>\text{Let }x_{0}=10,\,m=3</math><br //> | ||
+ | |||
+ | :<math>\begin{align} | ||
+ | |||
+ | x_{1} &{}= 10 &{}\mod{3} = 1 \\ | ||
+ | |||
+ | x_{2} &{}= 1 &{}\mod{3} = 1 \\ | ||
+ | |||
+ | x_{3} &{}= 1 &{}\mod{3} =1 \\ | ||
+ | \end{align}</math> | ||
+ | <math>\ldots</math><br /> | ||
+ | |||
+ | Excluding <math>x_{0}</math>, this example generates a series of ones. In general, excluding <math>x_{0}</math>, the algorithm above will always generate a series of the same number less than M. Hence, it has a period of 1. The '''period''' can be described as the length of a sequence before it repeats. We want a large period with a sequence that is random looking. We can modify this algorithm to form the Multiplicative Congruential Algorithm. <br /> | ||
+ | |||
+ | |||
+ | |||
+ | <math>x_{k+1}=(a \cdot x_{k} + b) \mod m </math>(a little tip: <math>(a \cdot b)\mod c = (a\mod c)\cdot(b\mod c))</math><br/> | ||
'''Example'''<br /> | '''Example'''<br /> | ||
Line 214: | Line 278: | ||
This example generates a sequence with a repeating cycle of two integers.<br /> | This example generates a sequence with a repeating cycle of two integers.<br /> | ||
− | (If we choose the numbers properly, we could get a sequence of "random" numbers. | + | (If we choose the numbers properly, we could get a sequence of "random" numbers. How do we find the value of <math>a,b,</math> and <math>m</math>? At the very least <math>m</math> should be a very '''large''', preferably prime number. The larger <math>m</math> is, the higher the possibility to get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed on the interval (0,1)). Matlab uses <math>a=7^5, b=0, m=2^{31}-1</math> – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that <math>m</math> should be '''large and prime''')<br /> |
+ | |||
+ | Note: <math>\frac {x_{n+1}}{m-1}</math> is an approximation to the value of a U(0,1) random variable.<br /> | ||
+ | |||
+ | |||
'''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /> | '''MatLab Instruction for Multiplicative Congruential Algorithm:'''<br /> | ||
Line 234: | Line 302: | ||
''(Note: <br /> | ''(Note: <br /> | ||
− | 1. Keep repeating this command over and over again and you will | + | 1. Keep repeating this command over and over again and you will get random numbers – this is how the command rand works in a computer. <br /> |
− | 2. There is a function in MATLAB called '''RAND''' to generate a number between 0 and 1. <br /> | + | 2. There is a function in MATLAB called '''RAND''' to generate a random number between 0 and 1. <br /> |
− | 3. If we would like to generate 1000 | + | For example, in MATLAB, we can use '''rand(1,1000)''' to generate 1000's numbers between 0 and 1. This is essentially a vector with 1 row, 1000 columns, with each entry a random number between 0 and 1.<br /> |
+ | 3. If we would like to generate 1000 or more numbers, we could use a '''for''' loop<br /><br /> | ||
''(Note on MATLAB commands: <br /> | ''(Note on MATLAB commands: <br /> | ||
Line 242: | Line 311: | ||
2. close all: closes all figures.<br /> | 2. close all: closes all figures.<br /> | ||
3. who: displays all defined variables.<br /> | 3. who: displays all defined variables.<br /> | ||
− | 4. clc: clears screen. | + | 4. clc: clears screen.<br /> |
+ | 5. ; : prevents the results from printing.<br /> | ||
+ | 6. disstool: displays a graphing tool.<br /><br /> | ||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
Line 261: | Line 332: | ||
− | This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. | + | This algorithm involves three integer parameters <math>a, b,</math> and <math>m</math> and an initial value, <math>x_0</math> called the '''seed'''. A sequence of numbers is defined by <math>x_{k+1} = ax_k+ b \mod m</math>. <br /> |
Note: For some bad <math>a</math> and <math>b</math>, the histogram may not look uniformly distributed.<br /> | Note: For some bad <math>a</math> and <math>b</math>, the histogram may not look uniformly distributed.<br /> | ||
Line 297: | Line 368: | ||
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ | x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\ | ||
\end{align}</math><br /> | \end{align}</math><br /> | ||
− | + | Another Example, a =3, b =2, m = 5, x_0=1 | |
etc. | etc. | ||
<hr/> | <hr/> | ||
<p style="color:red;font-size:16px;">FAQ:</P> | <p style="color:red;font-size:16px;">FAQ:</P> | ||
− | 1.Why | + | 1.Why is it 1 to 30 instead of 0 to 30 in the example above?<br> |
''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br> | ''<math>b = 0</math> so in order to have <math>x_k</math> equal to 0, <math>x_{k-1}</math> must be 0 (since <math>a=13</math> is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.''<br> | ||
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br> | Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group <math>\Z_{31}</math>.<br> | ||
Line 309: | Line 380: | ||
'''Examples:[From Textbook]'''<br /> | '''Examples:[From Textbook]'''<br /> | ||
− | + | <math>\text{If }x_0=3 \text{ and } x_n=(5x_{n-1}+7)\mod 200</math>, <math>\text{find }x_1,\cdots,x_{10}</math>.<br /> | |
'''Solution:'''<br /> | '''Solution:'''<br /> | ||
<math>\begin{align} | <math>\begin{align} | ||
Line 325: | Line 396: | ||
'''Comments:'''<br /> | '''Comments:'''<br /> | ||
+ | |||
+ | Matlab code: | ||
+ | a=5; | ||
+ | b=7; | ||
+ | m=200; | ||
+ | x(1)=3; | ||
+ | for ii=2:1000 | ||
+ | x(ii)=mod(a*x(ii-1)+b,m); | ||
+ | end | ||
+ | size(x); | ||
+ | hist(x) | ||
+ | |||
+ | |||
+ | |||
Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /> | Typically, it is good to choose <math>m</math> such that <math>m</math> is large, and <math>m</math> is prime. Careful selection of parameters '<math>a</math>' and '<math>b</math>' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for <math>m</math>, our results were not satisfactory in producing an output resembling a uniform distribution.<br /> | ||
The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /> | The computed values are between 0 and <math>m-1</math>. If the values are normalized by dividing by '''<math>m-1</math>''', their result is numbers uniformly distributed on the interval <math>\left[0,1\right]</math> (similar to computing from uniform distribution).<br /> | ||
− | From the example shown above, if we want to create a large group of random numbers, it is better to have large <math>m</math> so that the random values | + | From the example shown above, if we want to create a large group of random numbers, it is better to have large, prime <math>m</math> so that the generated random values will not repeat after several iterations. Note: the period for this example is 8: from '<math>x_2</math>' to '<math>x_9</math>'.<br /> |
− | There has been a research | + | There has been a research on how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.<br /> |
+ | <span style="background:#F5F5DC">Theorem (extra knowledge)</span><br /> | ||
+ | Let c be a non-zero constant. Then for any seed x0, and LCG will have largest max. period if and only if<br /> | ||
+ | (i) m and c are coprime;<br /> | ||
+ | (ii) (a-1) is divisible by all prime factor of m;<br /> | ||
+ | (iii) if and only if m is divisible by 4, then a-1 is also divisible by 4.<br /> | ||
+ | We want our LCG to have a large cycle. | ||
+ | We call a cycle with m element the maximal period. | ||
+ | We can make it bigger by making m big and prime. | ||
+ | Recall:any number you can think of can be broken into a factor of prime | ||
+ | Define coprime:Two numbers X and Y, are coprime if they do not share any prime factors. | ||
+ | Example:<br /> | ||
+ | <font size="3">Xn=(15Xn-1 + 4) mod 7</font><br /> | ||
+ | (i) m=7 c=4 -> coprime;<br /> | ||
+ | (ii) a-1=14 and a-1 is divisible by 7;<br /> | ||
+ | (iii) dose not apply.<br /> | ||
+ | (The extra knowledge stops here) | ||
− | this part | + | |
+ | |||
+ | In this part, I learned how to use R code to figure out the relationship between two integers | ||
division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution. | division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution. | ||
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | <div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | ||
− | < | + | <h4 style="text-align:center;">Summary of Multiplicative Congruential Algorithm</h4> |
<p><b>Problem:</b> generate Pseudo Random Numbers.</p> | <p><b>Problem:</b> generate Pseudo Random Numbers.</p> | ||
<b>Plan:</b> | <b>Plan:</b> | ||
<ol> | <ol> | ||
− | <li>find integer: <i>a b m</i>(large prime) < | + | <li>find integer: <i>a b m</i>(large prime) <i>x<sub>0</sub></i>(the seed) .</li> |
− | <li><math>x_{ | + | <li><math>x_{k+1}=(ax_{k}+b)</math>mod m</li> |
</ol> | </ol> | ||
<b>Matlab Instruction:</b> | <b>Matlab Instruction:</b> | ||
Line 358: | Line 461: | ||
</pre> | </pre> | ||
</div> | </div> | ||
+ | Another algorithm for generating pseudo random numbers is the multiply with carry method. Its simplest form is similar to the linear congruential generator. They differs in that the parameter b changes in the MWC algorithm. It is as follows: <br> | ||
+ | |||
+ | 1.) x<sub>k+1</sub> = ax<sub>k</sub> + b<sub>k</sub> mod m <br> | ||
+ | 2.) b<sub>k+1</sub> = floor((ax<sub>k</sub> + b<sub>k</sub>)/m) <br> | ||
+ | 3.) set k to k + 1 and go to step 1 | ||
+ | [http://www.javamex.com/tutorials/random_numbers/multiply_with_carry.shtml Source] | ||
=== Inverse Transform Method === | === Inverse Transform Method === | ||
− | + | Now that we know how to generate random numbers, we use these values to sample form distributions such as exponential. However, to easily use this method, the probability distribution consumed must have a cumulative distribution function (cdf) <math>F</math> with a tractable (that is, easily found) inverse <math>F^{-1}</math>.<br /> | |
'''Theorem''': <br /> | '''Theorem''': <br /> | ||
Line 367: | Line 476: | ||
follows the distribution function <math>F\left(\cdot\right)</math>, | follows the distribution function <math>F\left(\cdot\right)</math>, | ||
where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /> | where <math>F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}</math> is the generalized inverse.<br /> | ||
− | '''Note''': <math>F</math> need not be invertible, but if it is, then the generalized inverse is the same as the inverse in the usual case. | + | '''Note''': <math>F</math> need not be invertible everywhere on the real line, but if it is, then the generalized inverse is the same as the inverse in the usual case. We only need it to be invertible on the range of F(x), [0,1]. |
'''Proof of the theorem:'''<br /> | '''Proof of the theorem:'''<br /> | ||
The generalized inverse satisfies the following: <br /> | The generalized inverse satisfies the following: <br /> | ||
− | <math>\ | + | |
− | + | :<math>P(X\leq x)</math> <br /> | |
− | + | <math>= P(F^{-1}(U)\leq x)</math> (since <math>X= F^{-1}(U)</math> by the inverse method)<br /> | |
− | + | <math>= P((F(F^{-1}(U))\leq F(x))</math> (since <math>F </math> is monotonically increasing) <br /> | |
− | + | <math>= P(U\leq F(x)) </math> (since <math> P(U\leq a)= a</math> for <math>U \sim U(0,1), a \in [0,1]</math>,<br /> | |
− | + | <math>= F(x) , \text{ where } 0 \leq F(x) \leq 1 </math> <br /> | |
− | + | ||
− | + | This is the c.d.f. of X. <br /> | |
− | + | <br /> | |
− | |||
− | |||
That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /> | That is <math>F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)</math><br /> | ||
Line 391: | Line 498: | ||
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /> | Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=<math> F^{-1}(U) </math> <br /> | ||
− | Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. | + | Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. Of course, this holds true for all CDFs, since they are monotonic by definition. <br /> |
− | + | In short, what the theorem tells us is that we can use a random number <math> U from U(0,1) </math> to randomly sample a point on the CDF of X, then apply the inverse of the CDF to map the given probability to its domain, which gives us the random variable X.<br/> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | < | + | '''Example 1 - Exponential''': <math> f(x) = \lambda e^{-\lambda x}</math><br/> |
+ | Calculate the CDF:<br /> | ||
+ | <math> F(x)= \int_0^x f(t) dt = \int_0^x \lambda e ^{-\lambda t}\ dt</math> | ||
+ | <math> = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} </math> | ||
+ | <math> = -e^{-\lambda x} + e^0 =1 - e^{- \lambda x} </math><br /> | ||
+ | Solve the inverse:<br /> | ||
+ | <math> y=1-e^{- \lambda x} \Rightarrow 1-y=e^{- \lambda x} \Rightarrow x=-\frac {ln(1-y)}{\lambda}</math><br /> | ||
+ | <math> y=-\frac {ln(1-x)}{\lambda} \Rightarrow F^{-1}(x)=-\frac {ln(1-x)}{\lambda}</math><br /> | ||
+ | Note that 1 − U is also uniform on (0, 1) and thus −log(1 − U) has the same distribution as −logU. <br /> | ||
+ | Steps: <br /> | ||
Step 1: Draw U ~U[0,1];<br /> | Step 1: Draw U ~U[0,1];<br /> | ||
− | Step 2: <math> x=\frac{-ln( | + | Step 2: <math> x=\frac{-ln(U)}{\lambda} </math> <br /><br /> |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | EXAMPLE 2 Normal distribution | |
− | + | G(y)=P[Y<=y) | |
− | + | =P[-sqr (y) < z < sqr (y)) | |
− | + | =integrate from -sqr(z) to Sqr(z) 1/sqr(2pi) e ^(-z^2/2) dz | |
− | + | = 2 integrate from 0 to sqr(y) 1/sqr(2pi) e ^(-z^2/2) dz | |
− | + | its the cdf of Y=z^2 | |
− | |||
− | ' | + | pdf g(y)= G'(y) |
+ | pdf pf x^2 (1) | ||
− | + | '''MatLab Code''':<br /> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | '''MatLab | ||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
>>u=rand(1,1000); | >>u=rand(1,1000); | ||
− | >>hist(u) #will generate a fairly uniform diagram | + | >>hist(u) # this will generate a fairly uniform diagram |
</pre> | </pre> | ||
[[File:ITM_example_hist(u).jpg|300px]] | [[File:ITM_example_hist(u).jpg|300px]] | ||
Line 518: | Line 543: | ||
[[File:ITM_example_hist(x).jpg|300px]] | [[File:ITM_example_hist(x).jpg|300px]] | ||
− | < | + | '''Example 2 - Continuous Distribution''':<br /> |
+ | |||
+ | <math> f(x) = \dfrac {\lambda } {2}e^{-\lambda \left| x-\theta \right| } for -\infty < X < \infty , \lambda >0 </math><br/> | ||
+ | |||
+ | Calculate the CDF:<br /> | ||
+ | |||
+ | <math> F(x)= \frac{1}{2} e^{-\lambda (\theta - x)} , for \ x \le \theta </math><br/> | ||
+ | <math> F(x) = 1 - \frac{1}{2} e^{-\lambda (x - \theta)}, for \ x > \theta </math><br/> | ||
+ | |||
+ | Solve for the inverse:<br /> | ||
+ | |||
+ | <math>F^{-1}(x)= \theta + ln(2y)/\lambda, for \ 0 \le y \le 0.5</math><br/> | ||
+ | <math>F^{-1}(x)= \theta - ln(2(1-y))/\lambda, for \ 0.5 < y \le 1</math><br/> | ||
+ | |||
+ | Algorithm:<br /> | ||
+ | Steps: <br /> | ||
+ | Step 1: Draw U ~ U[0, 1];<br /> | ||
+ | Step 2: Compute <math>X = F^-1(U)</math> i.e. <math>X = \theta + \frac {1}{\lambda} ln(2U)</math> for U < 0.5 else <math>X = \theta -\frac {1}{\lambda} ln(2(1-U))</math> | ||
+ | |||
+ | |||
+ | '''Example 3 - <math>F(x) = x^5</math>''':<br/> | ||
+ | Given a CDF of X: <math>F(x) = x^5</math>, transform U~U[0,1]. <br /> | ||
+ | Sol: | ||
+ | Let <math>y=x^5</math>, solve for x: <math>x=y^\frac {1}{5}</math>. Therefore, <math>F^{-1} (x) = x^\frac {1}{5}</math><br /> | ||
+ | Hence, to obtain a value of x from F(x), we first set 'u' as an uniform distribution, then obtain the inverse function of F(x), and set | ||
+ | <math>x= u^\frac{1}{5}</math><br /><br /> | ||
+ | |||
+ | Algorithm:<br /> | ||
+ | Steps: <br /> | ||
+ | Step 1: Draw U ~ rand[0, 1];<br /> | ||
+ | Step 2: X=U^(1/5);<br /> | ||
+ | |||
+ | '''Example 4 - BETA(1,β)''':<br/> | ||
+ | Given u~U[0,1], generate x from BETA(1,β)<br /> | ||
+ | Solution: | ||
+ | <math>F(x)= 1-(1-x)^\beta</math>, | ||
+ | <math>u= 1-(1-x)^\beta</math><br /> | ||
+ | Solve for x: | ||
+ | <math>(1-x)^\beta = 1-u</math>, | ||
+ | <math>1-x = (1-u)^\frac {1}{\beta}</math>, | ||
+ | <math>x = 1-(1-u)^\frac {1}{\beta}</math><br /> | ||
+ | let β=3, use Matlab to construct N=1000 observations from Beta(1,3)<br /> | ||
+ | '''MatLab Code''':<br /> | ||
+ | |||
+ | <pre style="font-size:16px"> | ||
+ | >> u = rand(1,1000); | ||
+ | x = 1-(1-u)^(1/3); | ||
+ | >> hist(x,50) | ||
+ | >> mean(x) | ||
+ | </pre> | ||
+ | |||
+ | '''Example 5 - Estimating <math>\pi</math>''':<br/> | ||
+ | Let's use rand() and Monte Carlo Method to estimate <math>\pi</math> <br /> | ||
+ | N= total number of points <br /> | ||
+ | N<sub>c</sub> = total number of points inside the circle<br /> | ||
+ | Prob[(x,y) lies in the circle=<math>\frac {Area(circle)}{Area(square)}</math><br /> | ||
+ | If we take square of size 2, circle will have area =<math>\pi (\frac {2}{2})^2 =\pi</math>.<br /> | ||
+ | Thus <math>\pi= 4(\frac {N_c}{N})</math><br /> | ||
+ | |||
+ | <font size="3">For example, '''UNIF(a,b)'''<br /> | ||
+ | <math>y = F(x) = (x - a)/ (b - a) </math> | ||
+ | <math>x = (b - a ) * y + a</math> | ||
+ | <math>X = a + ( b - a) * U</math><br /> | ||
+ | where U is UNIF(0,1)</font> | ||
'''Limitations:'''<br /> | '''Limitations:'''<br /> | ||
Line 524: | Line 612: | ||
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /> | 2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.<br /> | ||
− | We learned how to prove the cdf | + | We learned how to prove the transformation from cdf to inverse cdf,and use the uniform distribution to obtain a value of x from F(x). |
− | We also | + | We can also use uniform distribution in inverse method to determine other distributions. |
− | The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same. | + | The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same. Then, we can look at the graph to determine what kind of distribution the graph resembles. |
− | |||
− | === Probability Distribution Function Tool in MATLAB === | + | ==== Probability Distribution Function Tool in MATLAB ==== |
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
disttool #shows different distributions | disttool #shows different distributions | ||
</pre> | </pre> | ||
− | This command allows users to explore the | + | This command allows users to explore different types of distribution and see how the changes affect the parameters on the plot of either a CDF or PDF. |
+ | |||
[[File:Disttool.jpg|450px]] | [[File:Disttool.jpg|450px]] | ||
change the value of mu and sigma can change the graph skew side. | change the value of mu and sigma can change the graph skew side. | ||
− | == | + | == Class 3 - Tuesday, May 14 == |
=== Recall the Inverse Transform Method === | === Recall the Inverse Transform Method === | ||
− | + | Let U~Unif(0,1),then the random variable X = F<sup>-1</sup>(u) has distribution F. <br /> | |
− | '''2 | + | To sample X with CDF F(x), <br /> |
+ | |||
+ | <math>1) U~ \sim~ Unif [0,1] </math> | ||
+ | '''2) X = F<sup>-1</sup>(u) '''<br /> | ||
+ | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<br /> | <br /> | ||
− | '''Note''': | + | '''Note''': CDF of a U(a,b) random variable is: |
:<math> | :<math> | ||
F(x)= \begin{cases} | F(x)= \begin{cases} | ||
Line 579: | Line 662: | ||
[[File:2.jpg]] <math>P(U\leq a)=a</math> | [[File:2.jpg]] <math>P(U\leq a)=a</math> | ||
− | Note that on a single point there is no mass probability (i.e. | + | Note that on a single point there is no mass probability (i.e. <math>u</math> <= 0.5, is the same as <math> u </math> < 0.5) |
+ | More formally, this is saying that <math> P(X = x) = F(x)- \lim_{s \to x^-}F(x)</math> , which equals zero for any continuous random variable | ||
− | + | ====Limitations of the Inverse Transform Method==== | |
− | Though this method is very easy to use and apply, it does have | + | Though this method is very easy to use and apply, it does have a major disadvantage/limitation: |
− | + | * We need to find the inverse cdf <math> F^{-1}(\cdot) </math>. In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x). | |
− | + | For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution. | |
+ | |||
+ | In conclusion, we need to find another way of sampling from more complicated distributions | ||
=== Discrete Case === | === Discrete Case === | ||
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/> | The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function: <br/> | ||
− | |||
:<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math> | :<math>\begin{align}P(X = x_i) &{}= p_i \end{align}</math> | ||
Line 604: | Line 689: | ||
5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br> | 5. Repeat the process again till we reached to <math>U\leq p_{o} + p_{1} + ......+ p_{k}</math>, deliver <math>X = x_{k}</math><br> | ||
− | + | Note that after generating a random U, the value of X can be determined by finding the interval <math>[F(x_{j-1}),F(x_{j})]</math> in which U lies. <br /> | |
− | |||
− | + | In summary: | |
+ | Generate a discrete r.v.x that has pmf:<br /> | ||
+ | P(X=xi)=Pi, x0<x1<x2<... <br /> | ||
+ | 1. Draw U~U(0,1);<br /> | ||
+ | 2. If F(x(i-1))<U<F(xi), x=xi.<br /> | ||
− | |||
− | and if 0.5 < U < | + | |
+ | '''Example 3.0:''' <br /> | ||
+ | Generate a random variable from the following probability function:<br /> | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | |- | ||
+ | | x | ||
+ | | -2 | ||
+ | | -1 | ||
+ | | 0 | ||
+ | | 1 | ||
+ | | 2 | ||
+ | |- | ||
+ | | f(x) | ||
+ | | 0.1 | ||
+ | | 0.5 | ||
+ | | 0.07 | ||
+ | | 0.03 | ||
+ | | 0.3 | ||
+ | |} | ||
+ | |||
+ | Answer:<br /> | ||
+ | 1. Gen U~U(0,1)<br /> | ||
+ | 2. If U < 0.5 then output -1<br /> | ||
+ | else if U < 0.8 then output 2<br /> | ||
+ | else if U < 0.9 then output -2<br /> | ||
+ | else if U < 0.97 then output 0 else output 1<br /> | ||
+ | |||
+ | '''Example 3.1 (from class):''' (Coin Flipping Example)<br /> | ||
+ | We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1. | ||
+ | |||
+ | We can define the U function so that: | ||
+ | |||
+ | If <math>U\leq 0.5</math>, then X = 0 | ||
+ | |||
+ | and if <math>0.5 < U\leq 1</math>, then X =1. | ||
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip. | This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip. | ||
Line 644: | Line 766: | ||
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa. | Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa. | ||
− | '''Example | + | '''Example 3.2 (From class):''' |
Suppose we have the following discrete distribution: | Suppose we have the following discrete distribution: | ||
Line 680: | Line 802: | ||
4. else 0.5<U<=1 deliver x=2 | 4. else 0.5<U<=1 deliver x=2 | ||
+ | Can you find a faster way to run this algorithm? Consider: | ||
+ | |||
+ | :<math> | ||
+ | x = \begin{cases} | ||
+ | 2, & \text{if } U\leq 0.5 \\ | ||
+ | 1, & \text{if } 0.5 < U \leq 0.7 \\ | ||
+ | 0, & \text{if } 0.7 <U\leq 1 | ||
+ | \end{cases}</math> | ||
+ | |||
+ | The logic for this is that U is most likely to fall into the largest range. Thus by putting the largest range (in this case x >= 0.5) we can improve the run time of this algorithm. Could this algorithm be improved further using the same logic? | ||
* '''Code''' (as shown in class)<br /> | * '''Code''' (as shown in class)<br /> | ||
Line 690: | Line 822: | ||
if u<=0.3 | if u<=0.3 | ||
x(ii)=0; | x(ii)=0; | ||
− | elseif u<0.5 | + | elseif u<=0.5 |
x(ii)=1; | x(ii)=1; | ||
else | else | ||
Line 701: | Line 833: | ||
[[File:Discrete_example.jpg|300px]] | [[File:Discrete_example.jpg|300px]] | ||
− | '''Example''': Generating a random variable from pdf <br> | + | The algorithm above generates a vector (1,1000) containing 0's ,1's and 2's in differing proportions. Due to the criteria for accepting 0, 1 or 2 into the vector we get proportions of 0,1 &2 that correspond to their respective probabilities. So plotting the histogram (frequency of 0,1&2) doesn't give us the pmf but a frequency histogram that shows the proportions of each, which looks identical to the pmf. |
+ | |||
+ | '''Example 3.3''': Generating a random variable from pdf <br> | ||
:<math> | :<math> | ||
f_{x}(x) = \begin{cases} | f_{x}(x) = \begin{cases} | ||
Line 717: | Line 851: | ||
:<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math> | :<math>\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}</math> | ||
− | '''Example''': Generating a Bernoulli random variable <br> | + | '''Example 3.4''': Generating a Bernoulli random variable <br> |
:<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math> | :<math>\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}</math> | ||
:<math> | :<math> | ||
Line 727: | Line 861: | ||
2. <math> | 2. <math> | ||
X = \begin{cases} | X = \begin{cases} | ||
− | + | 0, & \text{if } 0 < U < 1-p \\ | |
− | + | 1, & \text{if } 1-p \le U < 1 | |
\end{cases}</math> | \end{cases}</math> | ||
− | '''Example''': Generating a Poisson random variable <br> | + | '''Example 3.5''': Generating Binomial(n,p) Random Variable<br> |
+ | <math> use p\left( x=i+1\right) =\dfrac {n-i} {i+1}\dfrac {p} {1-p}p\left( x=i\right) </math> | ||
+ | |||
+ | Step 1: Generate a random number <math>U</math>.<br> | ||
+ | Step 2: <math>c = \frac {p}{(1-p)}</math>, <math>i = 0</math>, <math>pr = (1-p)^n</math>, <math>F = pr</math><br> | ||
+ | Step 3: If U<F, set X = i and stop,<br> | ||
+ | Step 4: <math> pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1</math><br> | ||
+ | Step 5: Go to step 3<br> | ||
+ | *Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross. | ||
+ | *Note: Another method by seeing the Binomial as a sum of n independent Bernoulli random variables, U1, ..., Un. Then set X equal to the number of Ui that are less than or equal to p. To use this method, n random numbers are needed and n comparisons need to be done. On the other hand, the inverse transformation method is simpler because only one random variable needs to be generated and it makes 1 + np comparisons.<br> | ||
+ | Step 1: Generate n uniform numbers U1 ... Un.<br> | ||
+ | Step 2: X = <math>\sum U_i < = p</math> where P is the probability of success. | ||
+ | |||
+ | '''Example 3.6''': Generating a Poisson random variable <br> | ||
− | Let X ~ Poi(u). Write an algorithm to generate X. | + | "Let X ~ Poi(u). Write an algorithm to generate X. |
The PDF of a poisson is: | The PDF of a poisson is: | ||
:<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math> | :<math>\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}</math> | ||
Line 747: | Line 894: | ||
<math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math> | <math>\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}</math> | ||
3) If U<F, output x <br> | 3) If U<F, output x <br> | ||
− | Else, <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br> | + | <font size="3">Else,</font> <math>\begin{align} p = (u/(x+1))^p \end{align}</math> <br> |
<math>\begin{align} F = F + p \end{align}</math> <br> | <math>\begin{align} F = F + p \end{align}</math> <br> | ||
<math>\begin{align} x = x + 1 \end{align}</math> <br> | <math>\begin{align} x = x + 1 \end{align}</math> <br> | ||
− | 4) Go to | + | 4) Go to 1" <br> |
− | Acknowledgements: This is from Stat 340 Winter 2013 | + | Acknowledgements: This is an example from Stat 340 Winter 2013 |
− | '''Example''': Generating Geometric Distribution: | + | '''Example 3.7''': Generating Geometric Distribution: |
− | Consider Geo(p) where p is the probability of success, and define random variable X such that X is the number of | + | Consider Geo(p) where p is the probability of success, and define random variable X such that X is the total number of trials required to achieve the first success. x=1,2,3..... We have pmf: |
− | <math>P(X=x_i) = \, p (1-p)^{x_{i-1 | + | <math>P(X=x_i) = \, p (1-p)^{x_{i}-1}</math> |
We have CDF: | We have CDF: | ||
− | <math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before observe the first success. | + | <math>F(x)=P(X \leq x)=1-P(X>x) = 1-(1-p)^x</math>, P(X>x) means we get at least x failures before we observe the first success. |
Now consider the inverse transform: | Now consider the inverse transform: | ||
:<math> | :<math> | ||
Line 783: | Line 930: | ||
4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /> | 4. Else if <math>U \leq P_{0} + P_{1} + P_{2} </math> deliver <math>x = x_{2}</math><br /> | ||
... | ... | ||
− | Else if <math>U \leq P_{0} + ... + P_{k} </math> deliver <math>x = x_{k}</math><br /> | + | <font size="3">Else if</font> <math>U \leq P_{0} + ... + P_{k} </math> <font size="3">deliver</font> <math>x = x_{k}</math><br /> |
+ | |||
+ | <br /'''>===Inverse Transform Algorithm for Generating a Binomial(n,p) Random Variable(from textbook)===''' | ||
+ | <br />step 1: Generate a random number U | ||
+ | <br />step 2: c=p/(1-p),i=0, pr=(1-p)<sup>n</sup>, F=pr. | ||
+ | <br />step 3: If U<F, set X=i and stop. | ||
+ | <br />step 4: pr =[c(n-i)/(i+1)]pr, F=F+pr, i=i+1. | ||
+ | <br />step 5: Go to step 3. | ||
+ | |||
'''Problems'''<br /> | '''Problems'''<br /> | ||
− | + | Though this method is very easy to use and apply, it does have a major disadvantage/limitation: | |
− | + | We need to find the inverse cdf F^{-1}(\cdot) . In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x). | |
− | + | For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution. | |
− | + | In conclusion, we need to find another way of sampling from more complicated distributions | |
− | + | Flipping a coin is a discrete case of uniform distribution, and the code below shows an example of flipping a coin 1000 times; the result is close to the expected value 0.5.<br> | |
− | Example 3 | + | Example 2, as another discrete distribution, shows that we can sample from parts like 0,1 and 2, and the probability of each part or each trial is the same.<br> |
+ | Example 3 uses inverse method to figure out the probability range of each random varible. | ||
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | <div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | ||
<h2 style="text-align:center;">Summary of Inverse Transform Method</h2> | <h2 style="text-align:center;">Summary of Inverse Transform Method</h2> | ||
Line 831: | Line 987: | ||
</div> | </div> | ||
− | === | + | === Generalized Inverse-Transform Method === |
+ | |||
+ | Valid for any CDF F(x): return X=min{x:F(x)<math>\leq</math> U}, where U~U(0,1) | ||
− | + | 1. Continues, possibly with flat spots (i.e. not strictly increasing) | |
− | |||
− | |||
− | + | 2. Discrete | |
− | + | 3. Mixed continues discrete | |
− | |||
+ | '''Advantages of Inverse-Transform Method''' | ||
− | + | Inverse transform method preserves monotonicity and correlation | |
− | |||
− | + | which helps in | |
− | |||
− | |||
− | |||
− | + | 1. Variance reduction methods ... | |
− | Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br> | + | 2. Generating truncated distributions ... |
+ | |||
+ | 3. Order statistics ... | ||
+ | |||
+ | ===Acceptance-Rejection Method=== | ||
+ | |||
+ | Although the inverse transformation method does allow us to change our uniform distribution, it has two limits; | ||
+ | # Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions) | ||
+ | # For some distributions, such as Gaussian, it is too difficult to find the inverse | ||
+ | |||
+ | To generate random samples for these functions, we will use different methods, such as the '''Acceptance-Rejection Method'''. This method is more efficient than the inverse transform method. The basic idea is to find an alternative probability distribution with density function f(x); | ||
+ | |||
+ | Suppose we want to draw random sample from a target density function ''f(x)'', ''x∈S<sub>x</sub>'', where ''S<sub>x</sub>'' is the support of ''f(x)''. If we can find some constant ''c''(≥1) (In practice, we prefer c as close to 1 as possible) and a density function ''g(x)'' having the same support ''S<sub>x</sub>'' so that ''f(x)≤cg(x), ∀x∈S<sub>x</sub>'', then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for ''g(x)''. | ||
+ | |||
+ | [[File:AR_Method.png]] | ||
+ | |||
+ | |||
+ | The main logic behind the Acceptance-Rejection Method is that:<br> | ||
+ | 1. We want to generate sample points from an unknown distribution, say f(x).<br> | ||
+ | 2. We use <math>\,cg(x)</math> to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)<br> | ||
+ | 3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.<br> | ||
+ | |||
+ | Note: If the red line was only g(x) as opposed to <math>\,c g(x)</math> (i.e. c=1), then <math>g(x) \geq f(x)</math> for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, <math>g(x) \ngeqq f(x)</math> \,∀x. <br> | ||
+ | |||
+ | Also remember that <math>\,c g(x)</math> always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.<br><br> | ||
c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br> | c must be chosen so that <math>f(x)\leqslant c g(x)</math> for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:<br> | ||
Line 861: | Line 1,037: | ||
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br> | 2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.<br> | ||
3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br> | 3. Verify that <math>f(x)\leqslant c g(x)</math> at all the local maximums as well as the absolute maximums.<br> | ||
− | 4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0. | + | 4. Verify that <math>f(x)\leqslant c g(x)</math> at the tail ends by calculating <math>\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}</math> and <math>\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}</math> and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.<br> |
+ | 5.Efficiency: the number of times N that steps 1 and 2 need to be called(also the number of iterations needed to successfully generate X) is a random variable and has a geometric distribution with success probability <math>p=P(U \leq f(Y)/(cg(Y)))</math> , <math>P(N=n)=(1-p(n-1))p ,n \geq 1</math>.Thus on average the number of iterations required is given by <math> E(N)=\frac{1} p</math> | ||
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. | c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability <math>f(x)\leqslant c g(x)</math> will be close to zero). This will render our algorithm inefficient. | ||
+ | The expected number of iterations of the algorithm required with an X is c. | ||
<br> | <br> | ||
'''Note:''' <br> | '''Note:''' <br> | ||
Line 889: | Line 1,067: | ||
Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br> | Note: Recall <math>P(U\leq a)=a</math>. Thus by comparing u and <math>\frac{f(y)}{\, c g(y)}</math>, we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.<br> | ||
− | ie. At X<sub>1</sub>, low probability to accept the point since f(x) much smaller than cg(x).<br> | + | ie. At X<sub>1</sub>, low probability to accept the point since f(x) is much smaller than cg(x).<br> |
At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution. | At X<sub>2</sub>, high probability to accept the point. <math>P(U\leq a)=a</math> in Uniform Distribution. | ||
Line 898: | Line 1,076: | ||
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x. | and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x. | ||
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance | for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance | ||
+ | |||
+ | '''Some notes on the constant C'''<br> | ||
+ | 1. C is chosen such that <math> c g(y)\geq f(y)</math>, that is,<math> c g(y)</math> will always dominate <math>f(y)</math>. Because of this, | ||
+ | C will always be greater than or equal to one and will only equal to one if and only if the proposal distribution and the target distribution are the same. It is normally best to choose C such that the absolute maxima of both <math> c g(y)</math> and <math> f(y)</math> are the same.<br> | ||
+ | |||
+ | 2. <math> \frac {1}{C} </math> is the area of <math> F(y)</math> over the area of <math> c G(y)</math> and is the acceptance rate of the points generated. For example, if <math> \frac {1}{C} = 0.7</math> then on average, 70 percent of all points generated are accepted.<br> | ||
+ | |||
+ | 3. C is the average number of times Y is generated from g . | ||
=== Theorem === | === Theorem === | ||
Line 904: | Line 1,090: | ||
=== Proof === | === Proof === | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Recall the conditional probability formulas:<br /> | Recall the conditional probability formulas:<br /> | ||
− | |||
<math>\begin{align} | <math>\begin{align} | ||
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf} | P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf} | ||
\end{align}</math><br /> | \end{align}</math><br /> | ||
+ | <math>P(y|accepted)=f(y)=\frac{P(accepted|y)P(y)}{P(accepted)}</math><br /> | ||
<br />based on the concept from '''procedure-step1''':<br /> | <br />based on the concept from '''procedure-step1''':<br /> | ||
<math>P(y)=g(y)</math><br /> | <math>P(y)=g(y)</math><br /> | ||
Line 943: | Line 1,122: | ||
'''Comments:''' | '''Comments:''' | ||
− | -Acceptance-Rejection Method is not good for all cases. One obvious | + | -Acceptance-Rejection Method is not good for all cases. The limitation with this method is that sometimes many points will be rejected. One obvious disadvantage is that it could be very hard to pick the <math>g(y)</math> and the constant <math>c</math> in some cases. We have to pick the SMALLEST C such that <math>cg(x) \leq f(x)</math> else the the algorithm will not be efficient. This is because <math>f(x)/cg(x)</math> will become smaller and probability <math>u \leq f(x)/cg(x)</math> will go down and many points will be rejected making the algorithm inefficient. |
− | < | + | |
+ | -'''Note:''' When <math>f(y)</math> is very different than <math>g(y)</math>, it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for <math>U</math> to be less than this small value. <br/>An example would be when the target function (<math>f</math>) has a spike or several spikes in its domain - this would force the known distribution (<math>g</math>) to have density at least as large as the spikes, making the value of <math>c</math> larger than desired. As a result, the algorithm would be highly inefficient. | ||
'''Acceptance-Rejection Method'''<br/> | '''Acceptance-Rejection Method'''<br/> | ||
Line 950: | Line 1,130: | ||
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/> | We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.<br/> | ||
We use a discrete distribution DU[0,2] to approximate this.<br/> | We use a discrete distribution DU[0,2] to approximate this.<br/> | ||
− | <math>f(x)=Pr(X=x)= | + | <math>f(x)=Pr(X=x)=2Cx×(0.5)^2\,</math><br/> |
{| class=wikitable align=left | {| class=wikitable align=left | ||
− | |x||0||1||2 | + | |<math>x</math>||0||1||2 |
|- | |- | ||
− | |f(x)||1/4||1/2||1/4 | + | |<math>f(x)</math>||1/4||1/2||1/4 |
|- | |- | ||
− | |g(x)||1/3||1/3||1/3 | + | |<math>g(x)</math>||1/3||1/3||1/3 |
|- | |- | ||
− | |c=f(x)/g(x)||3/4||3/2||3/4 | + | |<math>c=f(x)/g(x)</math>||3/4||3/2||3/4 |
|- | |- | ||
− | |f(x)/(cg(x))||1/2||1||1/2 | + | |<math>f(x)/(cg(x))</math>||1/2||1||1/2 |
|} | |} | ||
− | Since we need <math>c | + | Since we need <math>c \geq f(x)/g(x)</math><br/> |
We need <math>c=3/2</math><br/> | We need <math>c=3/2</math><br/> | ||
Line 971: | Line 1,151: | ||
1. Generate <math>u,v~U(0,1)</math><br/> | 1. Generate <math>u,v~U(0,1)</math><br/> | ||
2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/> | 2. Set <math>y= \lfloor 3*u \rfloor</math> (This is using uniform distribution to generate DU[0,2]<br/> | ||
− | 3. If <math>(y=0)</math> and <math>(v<1 | + | 3. If <math>(y=0)</math> and <math>(v<\tfrac{1}{2}), output=0</math> <br/> |
− | If <math>(y=2) </math> and <math>(v<1 | + | If <math>(y=2) </math> and <math>(v<\tfrac{1}{2}), output=2 </math><br/> |
Else if <math>y=1, output=1</math><br/> | Else if <math>y=1, output=1</math><br/> | ||
An elaboration of “c”<br/> | An elaboration of “c”<br/> | ||
− | c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < f(x) | + | c is the expected number of times the code runs to output 1 random variable. Remember that when <math>u < \tfrac{f(x)}{cg(x)}</math> is not satisfied, we need to go over the code again.<br/> |
Proof<br/> | Proof<br/> | ||
Line 986: | Line 1,166: | ||
Since we need to generate y from <math>g(x)</math>,<br/> | Since we need to generate y from <math>g(x)</math>,<br/> | ||
<math>Pr(select y)=g(y)</math><br/> | <math>Pr(select y)=g(y)</math><br/> | ||
− | <math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= (y)/(cg(y))</math> (Since u~Unif(0,1))<br/> | + | <math>Pr(output y|selected y)=Pr(u<f(y)/(cg(y)))= f(y)/(cg(y))</math> (Since u~Unif(0,1))<br/> |
<math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/> | <math>Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c</math> <br/> | ||
− | Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=c<br/> | + | Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=1/c<br/> |
Therefore, <math>E(X)=1/(1/c))=c</math> <br/> | Therefore, <math>E(X)=1/(1/c))=c</math> <br/> | ||
Line 994: | Line 1,174: | ||
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one. | Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one. | ||
− | the example shows how to choose the c for the two function g(x) and f(x). | + | the example shows how to choose the c for the two function <math>g(x)</math> and <math>f(x)</math>. |
=== Example of Acceptance-Rejection Method=== | === Example of Acceptance-Rejection Method=== | ||
− | Generating a random variable having p.d.f. | + | Generating a random variable having p.d.f. <br /> |
− | + | <math>\displaystyle f(x) = 20x(1 - x)^3, 0< x <1 </math><br /> | |
− | Since this random variable (which is beta with parameters 2, 4) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with | + | Since this random variable (which is beta with parameters (2,4)) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with<br /> |
− | + | <math>\displaystyle g(x) = 1,0<x<1</math><br /> | |
− | To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of | + | To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of<br /> |
− | + | <math>\displaystyle f(x)/g(x) = 20x(1 - x)^3 </math><br /> | |
− | Differentiation of this quantity yields | + | Differentiation of this quantity yields <br /> |
− | + | <math>\displaystyle d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2]</math><br /> | |
Setting this equal to 0 shows that the maximal value is attained when x = 1/4, | Setting this equal to 0 shows that the maximal value is attained when x = 1/4, | ||
− | and thus, | + | and thus, <br /> |
− | + | <math>\displaystyle f(x)/g(x)<= 20*(1/4)*(3/4)^3=135/64=c </math><br /> | |
− | Hence, | + | Hence,<br /> |
− | + | <math>\displaystyle f(x)/cg(x)=(256/27)*(x*(1-x)^3)</math><br /> | |
and thus the simulation procedure is as follows: | and thus the simulation procedure is as follows: | ||
1) Generate two random numbers U1 and U2 . | 1) Generate two random numbers U1 and U2 . | ||
− | 2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub> | + | 2) If U<sub>2</sub><(256/27)*U<sub>1</sub>*(1-U<sub>1</sub>)<sup>3</sup>, set X=U<sub>1</sub>, and stop |
Otherwise return to step 1). | Otherwise return to step 1). | ||
The average number of times that step 1) will be performed is c = 135/64. | The average number of times that step 1) will be performed is c = 135/64. | ||
Line 1,025: | Line 1,205: | ||
and we can calculate the best constant c. | and we can calculate the best constant c. | ||
− | === | + | ===Another Example of Acceptance-Rejection Method=== |
− | + | Generate a random variable from:<br /> | |
+ | <math>\displaystyle f(x)=3*x^2, 0<x<1 </math><br /> | ||
+ | Assume g(x) to be uniform over interval (0,1), where 0< x <1<br /> | ||
+ | Therefore:<br /> | ||
+ | <math>\displaystyle c = max(f(x)/(g(x)))= 3</math><br /> | ||
− | + | the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small. | |
+ | because g(.) is uniform so the g(x) is 1. max(g(x)) is 1<br /> | ||
+ | <math>\displaystyle f(x)/(cg(x))= x^2</math><br /> | ||
+ | Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm | ||
− | + | == Class 4 - Thursday, May 16 == | |
− | + | '''Goals'''<br> | |
+ | *When we want to find target distribution <math>f(x)</math>, we need to first find a proposal distribution <math>g(x)</math> that is easy to sample from. <br> | ||
+ | *Relationship between the proposal distribution and target distribution is: <math> c \cdot g(x) \geq f(x) </math>, where c is constant. This means that the area of f(x) is under the area of <math> c \cdot g(x)</math>. <br> | ||
+ | *Chance of acceptance is less if the distance between <math>f(x)</math> and <math> c \cdot g(x)</math> is big, and vice-versa, we use <math> c </math> to keep <math> \frac {f(x)}{c \cdot g(x)} </math> below 1 (so <math>f(x) \leq c \cdot g(x)</math>). Therefore, we must find the constant <math> C </math> to achieve this.<br /> | ||
+ | *In other words, <math>C</math> is chosen to make sure <math> c \cdot g(x) \geq f(x) </math>. However, it will not make sense if <math>C</math> is simply chosen to be arbitrarily large. We need to choose <math>C</math> such that <math>c \cdot g(x)</math> fits <math>f(x)</math> as tightly as possible. This means that we must find the minimum c such that the area of f(x) is under the area of c*g(x). <br /> | ||
+ | *The constant c cannot be a negative number.<br /> | ||
− | |||
− | < | + | '''How to find C''':<br /> |
− | + | <math>\begin{align} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | <math>\begin{align} | ||
&c \cdot g(x) \geq f(x)\\ | &c \cdot g(x) \geq f(x)\\ | ||
&c\geq \frac{f(x)}{g(x)} \\ | &c\geq \frac{f(x)}{g(x)} \\ | ||
&c= \max \left(\frac{f(x)}{g(x)}\right) | &c= \max \left(\frac{f(x)}{g(x)}\right) | ||
\end{align}</math><br> | \end{align}</math><br> | ||
+ | |||
If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/> | If <math>f</math> and <math> g </math> are continuous, we can find the extremum by taking the derivative and solve for <math>x_0</math> such that:<br/> | ||
<math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/> | <math> 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}</math> <br/> | ||
+ | |||
Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/> | Thus <math> c = \frac{f(x_0)}{g(x_0)} </math><br/> | ||
− | + | Note: This procedure is called the Acceptance-Rejection Method.<br> | |
− | |||
− | |||
− | *For this method to be efficient, the constant c must be selected so that the rejection rate is low.(The efficiency for this method is<math>\left ( \frac{1}{c} \right )</math>)<br> | + | '''The Acceptance-Rejection method''' involves finding a distribution that we know how to sample from, g(x), and multiplying g(x) by a constant c so that <math>c \cdot g(x)</math> is always greater than or equal to f(x). Mathematically, we want <math> c \cdot g(x) \geq f(x) </math>. |
− | *It is easy to show that the expected number of trials for an acceptance is | + | And it means, c has to be greater or equal to <math>\frac{f(x)}{g(x)}</math>. So the smallest possible c that satisfies the condition is the maximum value of <math>\frac{f(x)}{g(x)}</math><br/>. |
− | *recall the acceptance rate is 1/c.( | + | But in case of c being too large, the chance of acceptance of generated values will be small, thereby losing efficiency of the algorithm. Therefore, it is best to get the smallest possible c such that <math> c g(x) \geq f(x)</math>. <br> |
+ | |||
+ | '''Important points:'''<br> | ||
+ | |||
+ | *For this method to be efficient, the constant c must be selected so that the rejection rate is low. (The efficiency for this method is <math>\left ( \frac{1}{c} \right )</math>)<br> | ||
+ | *It is easy to show that the expected number of trials for an acceptance is <math> \frac{Total Number of Trials} {C} </math>. <br> | ||
+ | *recall the '''acceptance rate is 1/c'''. (Not rejection rate) | ||
:Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br> | :Let <math>X</math> be the number of trials for an acceptance, <math> X \sim~ Geo(\frac{1}{c})</math><br> | ||
:<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math> | :<math>\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c </math> | ||
*The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br> | *The number of trials needed to generate a sample size of <math>N</math> follows a negative binomial distribution. The expected number of trials needed is then <math>cN</math>.<br> | ||
*So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br> | *So far, the only distribution we know how to sample from is the '''UNIFORM''' distribution. <br> | ||
+ | |||
'''Procedure''': <br> | '''Procedure''': <br> | ||
+ | |||
1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br> | 1. Choose <math>g(x)</math> (simple density function that we know how to sample, i.e. Uniform so far) <br> | ||
− | The easiest case is | + | The easiest case is <math>U~ \sim~ Unif [0,1] </math>. However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the <math>U~ \sim~ Unif [0,1] </math> variable. <br> |
2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1. | 2. Find a constant c such that :<math> c \cdot g(x) \geq f(x) </math>, otherwise return to step 1. | ||
Line 1,100: | Line 1,266: | ||
#Let <math>Y \sim~ g(y)</math> | #Let <math>Y \sim~ g(y)</math> | ||
#Let <math>U \sim~ Unif [0,1] </math> | #Let <math>U \sim~ Unif [0,1] </math> | ||
− | #If <math>U \leq \frac{f( | + | #If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math> then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.) |
+ | |||
+ | <hr><b>Example: <br> | ||
− | + | Generate a random variable from the pdf</b><br> | |
<math> f(x) = | <math> f(x) = | ||
\begin{cases} | \begin{cases} | ||
Line 1,112: | Line 1,280: | ||
<math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br> | <math>beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}</math><br> | ||
− | Where Γ (n)=(n-1)! if n is positive integer | + | Where Γ (n) = (n - 1)! if n is positive integer |
− | <math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{t}dt</math> | + | <math>Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{-t}dt</math> |
Aside: Beta function | Aside: Beta function | ||
Line 1,133: | Line 1,301: | ||
Taking x = 1 gives the highest possible c, which is c=2 | Taking x = 1 gives the highest possible c, which is c=2 | ||
<br />Note that c is a scalar greater than 1. | <br />Note that c is a scalar greater than 1. | ||
+ | <br />cg(x) is proposal dist, and f(x) is target dist. | ||
[[File:Beta(2,1)_example.jpg|750x750px]] | [[File:Beta(2,1)_example.jpg|750x750px]] | ||
− | Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g | + | '''Note:''' g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that <math>c\cdot g</math> can cover entire f(x) area. In this case, c=2, so that makes g run from 0 to 2 on y-axis which covers f(x). |
− | Comment: | + | '''Comment:'''<br> |
− | From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x) we need to sample approximately 2000 points in UNIF(0,1). | + | From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x), we need to sample approximately 2000 points in UNIF(0,1). |
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br> | And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately <math>n\cdot c</math> points from the proposal distribution (g(x)) in total. <br> | ||
<b>Step</b> | <b>Step</b> | ||
<ol> | <ol> | ||
− | <li>Draw y~ | + | <li>Draw y~U(0,1)</li> |
− | <li>Draw u~ | + | <li>Draw u~U(0,1)</li> |
<li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, u \leq y,</math> then <math> x=y</math><br> | <li>if <math>u \leq \frac{(2\cdot y)}{(2\cdot 1)}, u \leq y,</math> then <math> x=y</math><br> | ||
<li>Else go to Step 1</li> | <li>Else go to Step 1</li> | ||
</ol> | </ol> | ||
− | Note: In the above example, we sample 2 numbers. If second number is equal to first | + | '''Note:''' In the above example, we sample 2 numbers. If second number (u) is less than or equal to first number (y), then accept x=y, if not then start all over. |
<span style="font-weight:bold;color:green;">Matlab Code</span> | <span style="font-weight:bold;color:green;">Matlab Code</span> | ||
Line 1,166: | Line 1,335: | ||
end | end | ||
end | end | ||
− | >>hist(x) | + | >>hist(x) # It is a histogram |
>>jj | >>jj | ||
jj = 2024 # should be around 2000 | jj = 2024 # should be around 2000 | ||
Line 1,172: | Line 1,341: | ||
[[File:ARM_Example.jpg|300px]] | [[File:ARM_Example.jpg|300px]] | ||
− | :'''*Note:''' The reason that a for loop is not used is that we need continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate. | + | :'''*Note:''' The reason that a for loop is not used is that we need to continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate. |
− | :'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm. | + | :'''*Note2:''' In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm. |
− | :'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples. | + | :'''*Note3:''' We use '''while''' instead of '''for''' when looping because we do not know how many iterations are required to generate 1000 successful samples. We can view this as a negative binomial distribution so while the expected number of iterations required is n * c, it will likely deviate from this amount. We expect 2000 in this case. |
− | :'''*Note4:''' If c=1, we will accept all points, which is the ideal situation. | + | :'''*Note4:''' If c=1, we will accept all points, which is the ideal situation. However, this is essentially impossible because if c = 1 then our distributions f(x) and g(x) must be identical, so we will have to be satisfied with as close to 1 as possible. |
− | ''' | + | '''Use Inverse Method for this Example'''<br> |
− | ''' | + | :<math>F(x)=\int_0^x \! 2s\,ds={x^2}-0={x^2}</math><br> |
+ | :<math>y=x^2</math><br> | ||
+ | :<math>x=\sqrt y</math> | ||
+ | :<math> F^{-1}\left (\, x \, \right) =\sqrt x</math> | ||
− | + | :*'''Procedure''' | |
+ | :1: Draw <math> U~ \sim~ Unif [0,1] </math><br> | ||
+ | :2: <math> x=F^{-1}\left (\, u\, \right) =\sqrt u</math> | ||
+ | <span style="font-weight:bold;color:green;">Matlab Code</span> | ||
+ | <pre style="font-size:16px"> | ||
+ | >>u=rand(1,1000); | ||
+ | >>x=u.^0.5; | ||
+ | >>hist(x) | ||
+ | </pre> | ||
+ | [[File:ARM(IFM)_Example.jpg|300px]] | ||
+ | |||
+ | <span style="font-weight:bold;colour:green;">Matlab Tip:</span> | ||
+ | Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the square root of the entire matrix U the period, "." would be excluded. i.e. Let matrix B=U^0.5, then <math>B^T*B=U</math>. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product. However, if we don't use ".", it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since the matrix dimensions must agree. | ||
+ | |||
+ | ''' | ||
+ | '''Example for A-R method:''' | ||
+ | |||
+ | Given <math> f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 </math>, use A-R method to generate random number | ||
+ | |||
+ | |||
+ | [[Solution:]] | ||
− | |||
− | |||
Let g=U(-1,1) and g(x)=1/2 | Let g=U(-1,1) and g(x)=1/2 | ||
Line 1,193: | Line 1,383: | ||
<math> cg(x)\geq f(x), | <math> cg(x)\geq f(x), | ||
c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, | c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, | ||
− | c=max 2 | + | c=max 2\cdot\frac{3}{4} (1-x^2) = 3/2 </math> |
The process: | The process: | ||
:1: Draw U1 ~ U(0,1) <br> | :1: Draw U1 ~ U(0,1) <br> | ||
− | :2: Draw U2~U(0,1) <br> | + | :2: Draw U2 ~ U(0,1) <br> |
:3: let <math> y = U1*2 - 1 </math> | :3: let <math> y = U1*2 - 1 </math> | ||
− | :4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{ | + | :4: if <math>U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{4}} = {1-y^2}</math>, then x=y, '''note that''' (3/4(1-y^2)/(3/4) is getting from f(y) / (cg(y)) ) |
:5: else: return to '''step 1''' | :5: else: return to '''step 1''' | ||
---- | ---- | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | =====Example of Acceptance-Rejection Method===== | |
− | |||
− | = | + | <math>\begin{align} |
+ | & f(x) = 3x^2, 0<x<1 \\ | ||
+ | \end{align}</math><br\> | ||
− | <math> | + | <math>\begin{align} |
− | + | & g(x)=1, 0<x<1 \\ | |
+ | \end{align}</math><br\> | ||
<math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br> | <math>c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 </math><br> | ||
Line 1,234: | Line 1,410: | ||
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br> | 1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br> | ||
− | 2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1 | + | 2. If <math>U_2 \leqslant {U_1}^2</math>, accept <math>\begin{align}U_1\end{align}</math> as the random variable with pdf <math>\begin{align}f\end{align}</math>, if not return to Step 1 |
− | We can also use <math>g(x)=2x</math> for a more efficient algorithm | + | We can also use <math>\begin{align}g(x)=2x\end{align}</math> for a more efficient algorithm |
<math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>. | <math>c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} </math>. | ||
− | Use the inverse method to sample from <math>g(x)</math> | + | Use the inverse method to sample from <math>\begin{align}g(x)\end{align}</math> |
− | <math>G(x)=x^2</math>. | + | <math>\begin{align}G(x)=x^2\end{align}</math>. |
− | Generate <math>U</math> from <math>U(0,1)</math> and set <math>x=sqrt(u)</math> | + | Generate <math>\begin{align}U\end{align}</math> from <math>\begin{align}U(0,1)\end{align}</math> and set <math>\begin{align}x=sqrt(u)\end{align}</math> |
1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br> | 1. Generate two uniform numbers in the unit interval <math>U_1, U_2 \sim~ U(0,1)</math><br> | ||
2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1 | 2. If <math>U_2 \leq \frac{3\sqrt{U_1}}{2}</math>, accept <math>U_1</math> as the random variable with pdf <math>f</math>, if not return to Step 1 | ||
+ | *Note :the function <math>\begin{align}q(x) = c * g(x)\end{align}</math> is called an envelop or majoring function.<br> | ||
+ | To obtain a better proposing function <math>\begin{align}g(x)\end{align}</math>, we can first assume a new <math>\begin{align}q(x)\end{align}</math> and then solve for the normalizing constant by integrating.<br> | ||
+ | In the previous example, we first assume <math>\begin{align}q(x) = 3x\end{align}</math>. To find the normalizing constant, we need to solve <math>k *\sum 3x = 1</math> which gives us k = 2/3. So,<math>\begin{align}g(x) = k*q(x) = 2x\end{align}</math>. | ||
+ | *Source: http://www.cs.bgu.ac.il/~mps042/acceptance.htm* | ||
'''Possible Limitations''' | '''Possible Limitations''' | ||
Line 1,285: | Line 1,465: | ||
-> Proposal distribution: UNIF(-R, R) | -> Proposal distribution: UNIF(-R, R) | ||
− | -> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U( | + | -> We know how to generate using <math> U \sim UNIF (0,1) </math> Let <math> Y= 2RU-R=R(2U-1)</math>, therefore Y follows <math>U(-R,R)</math> |
-> In order to maximize the function we must maximize the top and minimize the bottom. | -> In order to maximize the function we must maximize the top and minimize the bottom. | ||
Line 1,294: | Line 1,474: | ||
Thus, we have to maximize R^2-x^2. | Thus, we have to maximize R^2-x^2. | ||
=> When x=0, it will be maximized. | => When x=0, it will be maximized. | ||
− | Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is pi/4. | + | Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is <math>\pi/4</math>. |
We will accept the points with limit f(x)/[cg(x)]. | We will accept the points with limit f(x)/[cg(x)]. | ||
Line 1,304: | Line 1,484: | ||
Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points | Thus, <math>\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}</math> * this also means the probability we can accept points | ||
+ | The algorithm to generate random variable x is then: | ||
1. Draw <Math>\ U</math> from <math>\ U(0,1)</math> | 1. Draw <Math>\ U</math> from <math>\ U(0,1)</math> | ||
Line 1,309: | Line 1,490: | ||
2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math> | 2. Draw <Math>\ U_{1}</math> from <math>\ U(0,1)</math> | ||
− | 3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, x = | + | 3. If <math>U_{1} \leq \sqrt{1-(2U-1)^2}, set x = U_{1}</math> |
else return to step 1. | else return to step 1. | ||
Line 1,317: | Line 1,498: | ||
<Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br> | <Math> U_{1} \leq \sqrt{(1-(2U-1)^2)}</Math><br> | ||
<Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br> | <Math>\ U_{1}^2 \leq 1 - (2U -1)^2</Math><br> | ||
− | <Math>\ U_{1}^2 - 1 \leq (2U - 1)^2</Math><br> | + | <Math>\ U_{1}^2 - 1 \leq -(2U - 1)^2</Math><br> |
<Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math> | <Math>\ 1 - U_{1}^2 \geq (2U - 1)^2</Math> | ||
Line 1,325: | Line 1,506: | ||
'''One more example about AR method''' <br/> | '''One more example about AR method''' <br/> | ||
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value) | (In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value) | ||
− | Let <math>f(x)= | + | Let <math>f(x)=x×e^{-x}, x > 0 </math> <br/> |
− | Use <math>g(x)= | + | Use <math>g(x)=a×e^{-a×x}</math> to generate random variable <br/> |
<br/> | <br/> | ||
Solution: First of all, we need to find c<br/> | Solution: First of all, we need to find c<br/> | ||
Line 1,338: | Line 1,519: | ||
<math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/> | <math>\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} </math><br/> | ||
<math>\frac {f(0)}{g(0)} = 0</math><br/> | <math>\frac {f(0)}{g(0)} = 0</math><br/> | ||
− | <math>\frac {f( | + | <math>\frac {f(\infty)}{g(\infty)} = 0</math><br/> |
<br/> | <br/> | ||
therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/> | therefore, <b><math>c= \frac {e^{-1}}{a*(1-a)}</math></b><br/> | ||
Line 1,349: | Line 1,530: | ||
Procedure: <br/> | Procedure: <br/> | ||
1. Generate u v ~unif(0,1) <br/> | 1. Generate u v ~unif(0,1) <br/> | ||
− | 2. Generate y from g, since g is exponential with rate 2, let y=-ln(u) <br/> | + | 2. Generate y from g, since g is exponential with rate 2, let y=-0.5*ln(u) <br/> |
3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/> | 3. If <math>v<\frac{f(y)}{c\cdot g(y)}</math>, output y<br/> | ||
Else, go to 1<br/> | Else, go to 1<br/> | ||
Line 1,369: | Line 1,550: | ||
'''Summary of when to use the Accept Rejection Method''' <br/> | '''Summary of when to use the Accept Rejection Method''' <br/> | ||
− | 1) When the calculation of inverse cdf cannot to be computed or too difficult to compute. <br/> | + | 1) When the calculation of inverse cdf cannot to be computed or is too difficult to compute. <br/> |
2) When f(x) can be evaluated to at least one of the normalizing constant. <br/> | 2) When f(x) can be evaluated to at least one of the normalizing constant. <br/> | ||
3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/> | 3) A constant c where <math>f(x)\leq c\cdot g(x)</math><br/> | ||
4) A uniform draw<br/> | 4) A uniform draw<br/> | ||
− | + | ==== Interpretation of 'C' ==== | |
+ | We can use the value of c to calculate the acceptance rate by <math>\tfrac{1}{c}</math>. | ||
+ | |||
+ | For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted (<math>\tfrac{1}{1.5} = 0.667</math>). We can also call the efficiency of the method is 66.7%. | ||
− | + | Likewise, if the minimum value of possible values for C is <math>\tfrac{4}{3}</math>, <math>1/ \tfrac{4}{3}</math> of the generated random variables will be accepted. Thus the efficient of the algorithm is 75%. | |
− | |||
− | + | In order to ensure the algorithm is as efficient as possible, the 'C' value should be as close to one as possible, such that <math>\tfrac{1}{c}</math> approaches 1 => 100% acceptance rate. | |
− | |||
− | |||
− | * '''Code'''<br /> | + | >> close All |
+ | >> clear All | ||
+ | >> i=1 | ||
+ | >> j=0; | ||
+ | >> while ii<1000 | ||
+ | y=rand | ||
+ | u=rand | ||
+ | if u<=y; | ||
+ | x(ii)=y | ||
+ | ii=ii+1 | ||
+ | end | ||
+ | end | ||
+ | |||
+ | == Class 5 - Tuesday, May 21 == | ||
+ | Recall the example in the last lecture. The following code will generate a random variable required by the question. | ||
+ | |||
+ | * '''Code'''<br /> | ||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
>>close all | >>close all | ||
Line 1,397: | Line 1,594: | ||
if (1-u1^2)>=(2*u2-1)^2 | if (1-u1^2)>=(2*u2-1)^2 | ||
x(ii) = y; | x(ii) = y; | ||
− | ii = ii + 1; #Note: for beginner | + | ii = ii + 1; #Note: for beginner programmers that this step increases |
the ii value for next time through the while loop | the ii value for next time through the while loop | ||
end | end | ||
end | end | ||
− | >>hist(x,20) | + | >>hist(x,20) # 20 is the number of bars |
+ | |||
+ | >>hist(x,30) #30 is the number of bars | ||
</pre> | </pre> | ||
+ | calculate process: | ||
+ | <math>u_{1} <= \sqrt (1-(2u-1)^2) </math> <br> | ||
+ | <math>(u_{1})^2 <=(1-(2u-1)^2) </math> <br> | ||
+ | <math>(u_{1})^2 -1 <=(-(2u-1)^2) </math> <br> | ||
+ | <math>1-(u_{1})^2 >=((2u-1)^2-1) </math> <br> | ||
− | MATLAB tips: hist(x,y) where y is the number of bars in the graph. | + | MATLAB tips: hist(x,y) plots a histogram of variable x, where y is the number of bars in the graph. |
[[File:ARM_cont_example.jpg|300px]] | [[File:ARM_cont_example.jpg|300px]] | ||
− | |||
=== Discrete Examples === | === Discrete Examples === | ||
* '''Example 1''' <br> | * '''Example 1''' <br> | ||
Line 1,422: | Line 1,625: | ||
\end{align}</math><br/> | \end{align}</math><br/> | ||
− | The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose we | + | The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose also that we use the discrete uniform distribution as our target distribution, then <math> g(x)= P(X=x) =0.2 </math> for all X. |
− | |||
− | Step 1 | + | The following algorithm then yields our X: |
− | Step 2 | + | |
− | Step 3 | + | Step 1 Draw discrete uniform distribution of 1, 2, 3, 4 and 5, <math>Y \sim~ g</math>.<br/> |
+ | Step 2 Draw <math>U \sim~ U(0,1)</math>.<br/> | ||
+ | Step 3 If <math>U \leq \frac{f(Y)}{c \cdot g(Y)}</math>, then <b> X = Y </b>;<br/> | ||
Else return to Step 1.<br/> | Else return to Step 1.<br/> | ||
− | + | C can be found by maximizing the ratio :<math> \frac{f(x)}{g(x)} </math>. To do this, we want to maximize <math> f(x) </math> and minimize <math> g(x) </math>. <br> | |
− | :<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math> | + | :<math>c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 </math> <br/> |
+ | Note: In this case <math>f(x)=P(X=x)=0.3</math> (highest probability from the discrete probabilities in the question) | ||
:<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br> | :<math>\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} </math><br> | ||
Note: The U is independent from y in Step 2 and 3 above. | Note: The U is independent from y in Step 2 and 3 above. | ||
− | ~The constant c is a indicator of rejection rate | + | ~The constant c is a indicator of rejection rate or efficiency of the algorithm. It can represent the average number of trials of the algorithm. Thus, a higher c would mean that the algorithm is comparatively inefficient. |
− | the acceptance-rejection method of pmf, the uniform | + | the acceptance-rejection method of pmf, the uniform probability is the same for all variables, and there are 5 parameters(1,2,3,4,5), so g(x) is 0.2 |
Remember that we always want to choose <math> cg </math> to be equal to or greater than <math> f </math>, but as close as possible. | Remember that we always want to choose <math> cg </math> to be equal to or greater than <math> f </math>, but as close as possible. | ||
+ | <br />limitations: If the form of the proposal dist g is very different from target dist f, then c is very large and the algorithm is not computatively efficient. | ||
* '''Code for example 1'''<br /> | * '''Code for example 1'''<br /> | ||
Line 1,444: | Line 1,650: | ||
>>close all | >>close all | ||
>>clear all | >>clear all | ||
− | >>p=[.15 .25 .3 .1 .2]; | + | >>p=[.15 .25 .3 .1 .2]; %This a vector holding the values |
>>ii=1; | >>ii=1; | ||
>>while ii < 1000 | >>while ii < 1000 | ||
− | y=unidrnd(5); | + | y=unidrnd(5); %generates random numbers for the discrete uniform |
− | u=rand; | + | u=rand; distribution with maximum 5. |
if u<= p(y)/0.3 | if u<= p(y)/0.3 | ||
x(ii)=y; | x(ii)=y; | ||
Line 1,462: | Line 1,668: | ||
The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. | The acceptance rate is <math>\frac {1}{c}</math>, so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice. | ||
− | For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, a total of 1500 iterations would be required. | + | For example, if c = 1.5, the acceptance rate would be <math>\frac {1}{1.5}=\frac {2}{3}</math>. Thus, in order to generate 1000 random values, on average, a total of 1500 iterations would be required. |
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value. | A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value. | ||
Line 1,471: | Line 1,677: | ||
Let g be the uniform distribution of 1, 2, or 3<br /> | Let g be the uniform distribution of 1, 2, or 3<br /> | ||
g(x)= 1/3<br /> | g(x)= 1/3<br /> | ||
− | <math>c=max(p_{x} | + | <math>c=max(\tfrac{p_{x}}{g(x)})=0.6/(\tfrac{1}{3})=1.8</math><br /> |
+ | Hence <math>\tfrac{p(x)}{cg(x)} = p(x)/(1.8 (\tfrac{1}{3}))= \tfrac{p(x)}{0.6}</math> | ||
+ | |||
1,y~g<br /> | 1,y~g<br /> | ||
2,u~U(0,1)<br /> | 2,u~U(0,1)<br /> | ||
Line 1,480: | Line 1,688: | ||
>>close all | >>close all | ||
>>clear all | >>clear all | ||
− | >>p=[.1 .3 .6]; | + | >>p=[.1 .3 .6]; %This a vector holding the values |
>>ii=1; | >>ii=1; | ||
>>while ii < 1000 | >>while ii < 1000 | ||
− | y=unidrnd(3); | + | y=unidrnd(3); %generates random numbers for the discrete uniform distribution with maximum 3 |
− | u=rand; | + | u=rand; |
− | if u<= p(y)/ | + | if u<= p(y)/0.6 |
− | x(ii)=y; | + | x(ii)=y; |
− | ii=ii+1; | + | ii=ii+1; %else ii=ii+1 |
end | end | ||
end | end | ||
Line 1,495: | Line 1,703: | ||
* '''Example 3'''<br> | * '''Example 3'''<br> | ||
− | |||
− | |||
− | Use the geometric distribution for <math>g(x)</math>;<br> | + | Suppose <math>\begin{align}p_{x} = e^{-3}3^{x}/x! , x\geq 0\end{align}</math> (Poisson distribution) |
− | <math>g(x)=p(1-p)^{x}</math>, choose p=0.25<br> | + | |
− | Look at <math>p_{x}/g(x)</math> for the first few numbers: .199 .797 1.59 2.12 2.12 1.70 1.13 .647 .324 .144<br> | + | '''First:''' Try the first few <math>\begin{align}p_{x}'s\end{align}</math>: 0.0498, 0.149, 0.224, 0.224, 0.168, 0.101, 0.0504, 0.0216, 0.0081, 0.0027 for <math>\begin{align} x = 0,1,2,3,4,5,6,7,8,9 \end{align}</math><br> |
− | We want <math>c=max(p_{x}/g(x))</math> which is approximately 2.12<br> | + | |
+ | '''Proposed distribution:''' Use the geometric distribution for <math>\begin{align}g(x)\end{align}</math>;<br> | ||
+ | |||
+ | <math>\begin{align}g(x)=p(1-p)^{x}\end{align}</math>, choose <math>\begin{align}p=0.25\end{align}</math><br> | ||
+ | |||
+ | Look at <math>\begin{align}p_{x}/g(x)\end{align}</math> for the first few numbers: 0.199 0.797 1.59 2.12 2.12 1.70 1.13 0.647 0.324 0.144 for <math>\begin{align} x = 0,1,2,3,4,5,6,7,8,9 \end{align}</math><br> | ||
+ | |||
+ | We want <math>\begin{align}c=max(p_{x}/g(x))\end{align}</math> which is approximately 2.12<br> | ||
+ | |||
+ | '''The general procedures to generate <math>\begin{align}p(x)\end{align}</math> is as follows:''' | ||
+ | |||
+ | 1. Generate <math>\begin{align}U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)\end{align}</math><br> | ||
− | + | 2. <math>\begin{align}j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor+1;\end{align}</math><br> | |
− | |||
− | |||
+ | 3. if <math>U_{2} < \frac{p_{j}}{cg(j)}</math>, set <math>\begin{align}X = x_{j}\end{align}</math>, else go to step 1. | ||
+ | |||
+ | Note: In this case, <math>\begin{align}f(x)/g(x)\end{align}</math> is extremely difficult to differentiate so we were required to test points. If the function is very easy to differentiate, we can calculate the max as if it were a continuous function then check the two surrounding points for which is the highest discrete value. | ||
+ | |||
+ | * Source: http://www.math.wsu.edu/faculty/genz/416/lect/l04-46.pdf* | ||
*'''Example 4''' (Hypergeometric & Binomial)<br> | *'''Example 4''' (Hypergeometric & Binomial)<br> | ||
Line 1,540: | Line 1,760: | ||
The higher the rejection rate, more points will be rejected.<br> | The higher the rejection rate, more points will be rejected.<br> | ||
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br> | More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.<br> | ||
− | <div style="margin-bottom:10px;border:10px solid red;background: yellow"> | + | <div style="margin-bottom:10px;border:10px solid red;background: yellow"> the example below provides a better understanding about the pros and cons of the AR method. The AR method is useless when dealing with sampling distribution with a higher peak since c will be large, hence making our algorithm inefficient<br> |
− | which brings the acceptance rate low which leads to very time | + | which brings the acceptance rate low which leads to very time consuming sampling </div> |
<div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | <div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | ||
<h2 style="text-align:center;">Acceptance-Rejection Method</h2> | <h2 style="text-align:center;">Acceptance-Rejection Method</h2> | ||
Line 1,572: | Line 1,792: | ||
Recall that, | Recall that, | ||
− | Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability | + | Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportional to p(Y)/q(Y). |
Specifically, let c be a constant such that | Specifically, let c be a constant such that | ||
p(j)/q(j)<=c for all j such that p(j)>0 | p(j)/q(j)<=c for all j such that p(j)>0 | ||
Line 1,583: | Line 1,803: | ||
* '''Gamma'''<br /> | * '''Gamma'''<br /> | ||
− | The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is: <br> | + | The CDF of the Gamma distribution <math>Gamma(t,\lambda)</math> is(t denotes the shape, <math>\lambda</math> denotes the scale: <br> |
− | <math> F(x) = \int_0^{ | + | <math> F(x) = \int_0^{x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)</math>, where <math>t \in \N^+ \text{ and } \lambda \in (0,+\infty)</math>.<br> |
+ | Note that the CDF of the Gamma distribution does not have a closed form. | ||
− | Neither Inverse Transformation nor Acceptance | + | The gamma distribution is often used to model waiting times between a certain number of events. It can also be expressed as the sum of infinitely many independent and identically distributed exponential distributions. This distribution has two parameters: the number of exponential terms n, and the rate parameter <math>\lambda</math>. In this distribution there is the Gamma function, <math>\Gamma </math> which has some very useful properties. "Source: STAT 340 Spring 2010 Course Notes" <br/> |
+ | |||
+ | Neither Inverse Transformation nor Acceptance-Rejection Method can be easily applied to Gamma distribution. | ||
However, we can use additive property of Gamma distribution to generate random variables. | However, we can use additive property of Gamma distribution to generate random variables. | ||
* '''Additive Property'''<br /> | * '''Additive Property'''<br /> | ||
− | If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math> Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math> | + | If <math>X_1, \dots, X_t</math> are independent exponential distributions with hazard rate <math> \lambda </math> (in other words, <math> X_i\sim~ Exp (\lambda) </math><math>, Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) </math> |
− | Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B). | + | Side notes: if <math> X_i\sim~ Gamma(a,\lambda)</math> and <math> Y_i\sim~ Gamma(B,\lambda)</math> are independent gamma distributions, then <math>\frac{X}{X+Y}</math> has a distribution of <math> Beta(a,B). </math> |
− | If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up. | + | If we want to sample from the Gamma distribution, we can consider sampling from <math>t</math> independent exponential distributions using the Inverse Method for each <math> X_i</math> and add them up. Note that this only works the specific set of gamma distributions where t is a positive integer. |
− | According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together. | + | According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of <math>Gamma(20,10)</math> random variables, so we need to obtain the value of each one by adding 20 values of <math>X_i \sim~ Exp(10)</math>. To achieve this, we generate a 20-by-1000 matrix whose entries follow <math>Exp(10)</math> and add the rows together.<br /> |
− | <math> x_1 | + | <math> x_1 \sim~Exp(\lambda)</math><br /> |
− | <math>x_2 | + | <math>x_2 \sim~Exp(\lambda)</math><br /> |
− | ... | + | ...<br /> |
− | <math>x_t | + | <math>x_t \sim~Exp(\lambda)</math><br /> |
− | <math>x_1+x_2+...+x_t</math> | + | <math>x_1+x_2+...+x_t~</math> |
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
Line 1,638: | Line 1,861: | ||
all the elements are generated by rand | all the elements are generated by rand | ||
>>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) | >>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) | ||
− | >>xx = sum(x) | + | >>xx = sum(x) Note: sum(x) will sum all elements in the same column. |
size(xx) can help you to verify | size(xx) can help you to verify | ||
>>size(sum(x)) Note: see the size of x if we forget it | >>size(sum(x)) Note: see the size of x if we forget it | ||
Line 1,650: | Line 1,873: | ||
size(x) and size(u) are both 20*1000 matrix. | size(x) and size(u) are both 20*1000 matrix. | ||
− | Since if u~unif(0, 1), u and 1 - u have the same distribution, we can | + | Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitute 1-u with u to simply the equation. |
Alternatively, the following command will do the same thing with the previous commands. | Alternatively, the following command will do the same thing with the previous commands. | ||
Line 1,663: | Line 1,886: | ||
</pre> | </pre> | ||
− | + | In the matrix rand(20,1000) means 20 row with 1000 numbers for each. | |
use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram. | use the code to show the generalize the distributions for multidimensional purposes in different cases, such as sum xi (each xi not equal xj), and they are independent, or matrix. Finally, we can see the conclusion is shown by the histogram. | ||
− | === Other Sampling Method: | + | === Other Sampling Method: Box Muller === |
[[File:Unnamed_QQ_Screenshot20130521203625.png]] | [[File:Unnamed_QQ_Screenshot20130521203625.png]] | ||
* From cartesian to polar coordinates <br /> | * From cartesian to polar coordinates <br /> | ||
<math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /> | <math> R=\sqrt{x_{1}^2+x_{2}^2}= x_{2}/sin(\theta)= x_{1}/cos(\theta)</math> <br /> | ||
<math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /> | <math> tan(\theta)=x_{2}/x_{1} \rightarrow \theta=tan^{-1}(x_{2}/x_{1})</math> <br /> | ||
− | + | ||
− | + | *Box-Muller Transformation:<br> | |
− | + | It is a transformation that consumes two continuous uniform random variables <math> X \sim U(0,1), Y \sim U(0,1) </math> and outputs a bivariate normal random variable with <math> Z_1\sim N(0,1), Z_2\sim N(0,1). </math> | |
=== '''Matlab''' === | === '''Matlab''' === | ||
− | If X is a matrix | + | If X is a matrix, |
− | + | * ''X(1,:)'' returns the first row | |
− | + | * ''X(:,1)'' returns the first column | |
− | + | * ''X(i,j)'' returns the (i,j)th entry | |
− | + | * ''sum(X,'''1''')'' or ''sum(X)'' is a summation of the '''rows''' of X. The output is a row vector of the sums of each column. | |
− | + | * ''sum(X,'''2''')'' is a summation of the '''columns''' of X, returning a vector. | |
− | + | * ''rand(r,c)'' will generate uniformly distributed random numbers in r rows and c columns. | |
− | + | * The dot operator (.), when placed before a function, such as +,-,^, *, and many others specifies to apply that function to every element of a vector or a matrix. For example, to add a constant c to elements of a matrix A, do A.+c as opposed to simply A+c. The dot operator is not required for functions that can only take a number as their input (such as log). | |
− | + | * Matlab processes loops very slow, while it is fast with matrices and vectors, so it is preferable to use the dot operator to and matrices of random numbers than loops if it is possible. | |
== Class 6 - Thursday, May 23 == | == Class 6 - Thursday, May 23 == | ||
Line 1,696: | Line 1,919: | ||
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math> | :<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math> | ||
− | *Warning : the General Normal distribution is | + | *Warning : the General Normal distribution is: |
− | : | ||
<table> | <table> | ||
<tr> | <tr> | ||
Line 1,749: | Line 1,971: | ||
Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) | Let <math> \theta </math> and R denote the Polar coordinate of the vector (X, Y) | ||
+ | where <math> X = R \cdot \sin\theta </math> and <math> Y = R \cdot \cos \theta </math> | ||
[[File:rtheta.jpg]] | [[File:rtheta.jpg]] | ||
Line 1,765: | Line 1,988: | ||
We know that | We know that | ||
− | + | <math>R^{2}= X^{2}+Y^{2}</math> and <math> \tan(\theta) = \frac{y}{x} </math> where X and Y are two independent standard normal | |
:<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math> | :<math>f(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}</math> | ||
:<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math> | :<math>f(y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}</math> | ||
− | :<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since for independent distributions, their joint probability function is the multiplication of two independent probability functions | + | :<math>f(x,y) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2} * \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} y^2}=\frac{1}{2\pi}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} (x^2+y^2)} </math><br /> - Since for independent distributions, their joint probability function is the multiplication of two independent probability functions. It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by, 1-1 transformation:<br /> |
− | It can also be shown using 1-1 transformation that the joint distribution of R and θ is given by, | + | |
− | 1-1 transformation:<br /> | + | |
− | Let <math>d=R^2</math><br /> | + | '''Let <math>d=R^2</math>'''<br /> |
+ | |||
<math>x= \sqrt {d}\cos \theta </math> | <math>x= \sqrt {d}\cos \theta </math> | ||
<math>y= \sqrt {d}\sin \theta </math> | <math>y= \sqrt {d}\sin \theta </math> | ||
then | then | ||
<math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math> | <math>\left| J\right| = \left| \dfrac {1} {2}d^{-\frac {1} {2}}\cos \theta d^{\frac{1}{2}}\cos \theta +\sqrt {d}\sin \theta \dfrac {1} {2}d^{-\frac{1}{2}}\sin \theta \right| = \dfrac {1} {2}</math> | ||
− | It can be shown that the | + | It can be shown that the joint density of <math> d /R^2</math> and <math> \theta </math> is: |
:<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math> | :<math>\begin{matrix} f(d,\theta) = \frac{1}{2}e^{-\frac{d}{2}}*\frac{1}{2\pi},\quad d = R^2 \end{matrix},\quad for\quad 0\leq d<\infty\ and\quad 0\leq \theta\leq 2\pi </math> | ||
Line 1,783: | Line 2,007: | ||
Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent | Note that <math> \begin{matrix}f(r,\theta)\end{matrix}</math> consists of two density functions, Exponential and Uniform, so assuming that r and <math>\theta</math> are independent | ||
<math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> | <math> \begin{matrix} \Rightarrow d \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> | ||
− | ::* <math> \begin{align} R^2 = x^2 + y^2 \end{align} </math> | + | ::* <math> \begin{align} R^2 = d = x^2 + y^2 \end{align} </math> |
::* <math> \tan(\theta) = \frac{y}{x} </math> | ::* <math> \tan(\theta) = \frac{y}{x} </math> | ||
<math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> | <math>\begin{align} f(d) = Exp(1/2)=\frac{1}{2}e^{-\frac{d}{2}}\ \end{align}</math> | ||
Line 1,789: | Line 2,013: | ||
<math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math> | <math>\begin{align} f(\theta) =\frac{1}{2\pi}\ \end{align}</math> | ||
<br> | <br> | ||
+ | |||
To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /> | To sample from the normal distribution, we can generate a pair of independent standard normal X and Y by:<br /> | ||
+ | |||
1) Generating their polar coordinates<br /> | 1) Generating their polar coordinates<br /> | ||
2) Transforming back to rectangular (Cartesian) coordinates.<br /> | 2) Transforming back to rectangular (Cartesian) coordinates.<br /> | ||
− | |||
− | |||
− | |||
− | :<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math> | + | |
− | :<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math> | + | '''Alternative Method of Generating Standard Normal Random Variables'''<br /> |
+ | |||
+ | Step 1: Generate <math>u_{1}</math> ~<math>Unif(0,1)</math><br /> | ||
+ | Step 2: Generate <math>Y_{1}</math> ~<math>Exp(1)</math>,<math>Y_{2}</math>~<math>Exp(2)</math><br /> | ||
+ | Step 3: If <math>Y_{2} \geq(Y_{1}-1)^2/2</math>,set <math>V=Y1</math>,otherwise,go to step 1<br /> | ||
+ | Step 4: If <math>u_{1} \leq 1/2</math>,then <math>X=-V</math><br /> | ||
+ | |||
+ | ===Expectation of a Standard Normal distribution===<br /> | ||
+ | |||
+ | The expectation of a standard normal distribution is 0<br /> | ||
+ | |||
+ | '''Proof:''' <br /> | ||
+ | |||
+ | :<math>\operatorname{E}[X]= \;\int_{-\infty}^{\infty} x \frac{1}{\sqrt{2\pi}} e^{-x^2/2} \, dx.</math> | ||
+ | :<math>\phi(x) = \frac{1}{\sqrt{2\pi}}\, e^{- \frac{\scriptscriptstyle 1}{\scriptscriptstyle 2} x^2}.</math> | ||
:<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math> | :<math>=\;\int_{-\infty}^{\infty} x \phi(x), dx.</math> | ||
:Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'') | :Since the first derivative ''ϕ''′(''x'') is −''xϕ''(''x'') | ||
Line 1,803: | Line 2,040: | ||
:<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math> | :<math>= - \left[\phi(x)\right]_{-\infty}^{\infty}</math> | ||
:<math>= 0</math><br /> | :<math>= 0</math><br /> | ||
− | |||
− | + | '''Note,''' more intuitively, because x is an odd function (f(x)+f(-x)=0). Taking integral of x will give <math>x^2/2 </math> which is an even function (f(x)=f(-x)). This is in relation to the symmetrical properties of the standard normal distribution. If support is from negative infinity to infinity, then the integral will return 0.<br /> | |
− | Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique | + | |
+ | |||
+ | '''Procedure (Box-Muller Transformation Method):''' <br /> | ||
+ | |||
+ | Pseudorandom approaches to generating normal random variables used to be limited. Inefficient methods such as inverse Gaussian function, sum of uniform random variables, and acceptance-rejection were used. In 1958, a new method was proposed by George Box and Mervin Muller of Princeton University. This new technique was easy to use and also had the accuracy to the inverse transform sampling method that it grew more valuable as computers became more computationally astute. <br> | ||
The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br> | The Box-Muller method takes a sample from a bivariate independent standard normal distribution, each component of which is thus a univariate standard normal. The algorithm is based on the following two properties of the bivariate independent standard normal distribution: <br> | ||
− | if Z = ( | + | if <math>Z = (Z_{1}, Z_{2}</math>) has this distribution, then <br> |
− | 1. | + | |
− | P(R | + | 1.<math>R^2=Z_{1}^2+Z_{2}^2</math> is exponentially distributed with mean 2, i.e. <br> |
− | 2. | + | <math>P(R^2 \leq x) = 1-e^{-x/2}</math>. <br> |
+ | 2.Given <math>R^2</math>, the point <math>(Z_{1},Z_{2}</math>) is uniformly distributed on the circle of radius R centered at the origin. <br> | ||
We can use these properties to build the algorithm: <br> | We can use these properties to build the algorithm: <br> | ||
+ | |||
1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /> | 1) Generate random number <math> \begin{align} U_1,U_2 \sim~ \mathrm{Unif}(0, 1) \end{align} </math> <br /> | ||
Line 1,822: | Line 2,064: | ||
− | <math> \begin{matrix} \ R^2 \sim~ Exp(2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /> | + | <math> \begin{matrix} \ R^2 \sim~ Exp(1/2), \theta \sim~ Unif[0,2\pi] \end{matrix} </math> <br /> |
Note: If U~unif(0,1), then ln(1-U)=ln(U) | Note: If U~unif(0,1), then ln(1-U)=ln(U) | ||
Line 1,831: | Line 2,073: | ||
− | Note: In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /> | + | '''Note:''' In steps 2 and 3, we are using a similar technique as that used in the inverse transform method. <br /> |
The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /> | The Box-Muller Transformation Method generates a pair of independent Standard Normal distributions, X and Y (Using the transformation of polar coordinates). <br /> | ||
+ | If you want to generate a number of independent standard normal distributed numbers (more than two), you can run the Box-Muller method several times.<br/> | ||
+ | For example: <br /> | ||
+ | If you want 8 independent standard normal distributed numbers, then run the Box-Muller methods 4 times (8/2 times). <br /> | ||
+ | If you want 9 independent standard normal distributed numbers, then run the Box-Muller methods 5 times (10/2 times), and then delete one. <br /> | ||
+ | |||
+ | |||
+ | '''Matlab Code'''<br /> | ||
− | |||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
>>close all | >>close all | ||
Line 1,850: | Line 2,098: | ||
>>hist(y) | >>hist(y) | ||
</pre> | </pre> | ||
+ | <br> | ||
+ | '''Remember''': For the above code to work the "." needs to be after the d to ensure that each element of d is raised to the power of 0.5.<br /> Otherwise matlab will raise the entire matrix to the power of 0.5."<br> | ||
− | + | '''Note:'''<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a exponential distribution.<br>The third one is hist(x) and it is a normal distribution.<br>The last one is hist(y) and it is also a normal distribution. | |
− | |||
− | Note:<br>the first graph is hist(tet) and it is a uniform distribution.<br>The second one is hist(d) and it is a | ||
Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br> | Attention:There is a "dot" between sqrt(d) and "*". It is because d and tet are vectors. <br> | ||
Line 1,870: | Line 2,118: | ||
>>hist(x) | >>hist(x) | ||
>>hist(x+2) | >>hist(x+2) | ||
− | >>hist(x*2+2) | + | >>hist(x*2+2)<br> |
</pre> | </pre> | ||
− | + | <br> | |
− | Note: randn is random sample from a standard normal distribution.<br /> | + | '''Note:'''<br> |
− | + | 1. randn is random sample from a standard normal distribution.<br /> | |
− | + | 2. hist(x+2) will be centered at 2 instead of at 0. <br /> | |
+ | 3. hist(x*3+2) is also centered at 2. The mean doesn't change, but the variance of x*3+2 becomes nine times (3^2) the variance of x.<br /> | ||
[[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]] | [[File:Normal_x.jpg|300x300px]][[File:Normal_x+2.jpg|300x300px]][[File:Normal(2x+2).jpg|300px]] | ||
<br /> | <br /> | ||
− | <b>Comment</b>: Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /> | + | <b>Comment</b>:<br /> |
+ | Box-Muller transformations are not computationally efficient. The reason for this is the need to compute sine and cosine functions. A way to get around this time-consuming difficulty is by an indirect computation of the sine and cosine of a random angle (as opposed to a direct computation which generates U and then computes the sine and cosine of 2πU. <br /> | ||
+ | |||
+ | |||
'''Alternative Methods of generating normal distribution'''<br /> | '''Alternative Methods of generating normal distribution'''<br /> | ||
+ | |||
1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /> | 1. Even though we cannot use inverse transform method, we can approximate this inverse using different functions.One method would be '''rational approximation'''.<br /> | ||
2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /> | 2.'''Central limit theorem''' : If we sum 12 independent U(0,1) distribution and subtract 6 (which is E(ui)*12)we will approximately get a standard normal distribution.<br /> | ||
Line 1,895: | Line 2,148: | ||
=== Proof of Box Muller Transformation === | === Proof of Box Muller Transformation === | ||
− | Definition: | + | '''Definition:'''<br /> |
A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution). | A transformation which transforms from a '''two-dimensional continuous uniform''' distribution to a '''two-dimensional bivariate normal''' distribution (or complex normal distribution). | ||
− | Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0, | + | Let U<sub>1</sub> and U<sub>2</sub> be independent uniform (0,1) random variables. Then |
− | <math>X_{1} = -2lnU_{1}*cos(2\pi U_{2})</math> | + | <math>X_{1} = ((-2lnU_{1})^.5)*cos(2\pi U_{2})</math> |
− | <math>X_{ | + | <math>X_{2} = (-2lnU_{1})^0.5*sin(2\pi U_{2})</math> |
are '''independent''' N(0,1) random variables. | are '''independent''' N(0,1) random variables. | ||
Line 1,915: | Line 2,168: | ||
u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2) | u<sub>2</sub> = g<sub>2</sub> ^-1(x1,x2) | ||
− | Inverting the above | + | Inverting the above transformation, we have |
u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2} | u1 = exp^{-(x<sub>1</sub> ^2+ x<sub>2</sub> ^2)/2} | ||
u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>) | u2 = (1/2pi)*tan^-1 (x<sub>2</sub>/x<sub>1</sub>) | ||
Line 1,922: | Line 2,175: | ||
f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi | f(x1,x2) = {exp^(-(x1^2+x2^2)/2)}/2pi | ||
which factors into two standard normal pdfs. | which factors into two standard normal pdfs. | ||
+ | |||
+ | |||
+ | (The quote is from http://mathworld.wolfram.com/Box-MullerTransformation.html) | ||
+ | (The proof is from http://www.math.nyu.edu/faculty/goodman/teaching/MonteCarlo2005/notes/GaussianSampling.pdf) | ||
=== General Normal distributions === | === General Normal distributions === | ||
− | General normal distribution is a special version of normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. | + | General normal distribution is a special version of the standard normal distribution. The domain of the general normal distribution is affected by the standard deviation and translated by the mean value. |
*The pdf of the general normal distribution is | *The pdf of the general normal distribution is | ||
: | : | ||
Line 1,937: | Line 2,194: | ||
</td> | </td> | ||
<td> | <td> | ||
− | <div id="woyun" style=" | + | <div id="woyun" style="visibility:hidden">which is almost useless in this course</div> |
+ | </td> | ||
+ | </tr> | ||
+ | </table> | ||
+ | where <math> \mu </math> is the mean or expectation of the distribution and <math> \sigma </math> is standard deviation <br /> | ||
− | + | The probability density must be scaled by 1/sigma so that the integral is still 1.(Acknowledge: https://en.wikipedia.org/wiki/Normal_distribution) | |
− | + | The special case of the normal distribution is standard normal distribution, which the variance is 1 and the mean is zero. If X is a general normal deviate, then <math> Z=\dfrac{X - (\mu)}{\sigma} </math> will have a standard normal distribution. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | If Z ~ N(0,1), and we want <math>X </math>~<math> N(\mu, \sigma^2)</math>, then <math>X = \mu + \sigma * Z</math> Since <math>E(x) = \mu +\sigma*0 = \mu </math> and <math>Var(x) = 0 +\sigma^2*1</math> | ||
− | + | If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | If <math>Z_1,...Z_d</math> ~ N(0,1) and are independent then <math>Z = (Z_1,..Z_d)^{T} </math>~ <math>N(0,I_d)</math> | ||
ie. | ie. | ||
* '''Code'''<br /> | * '''Code'''<br /> | ||
Line 2,002: | Line 2,231: | ||
The values for v generated in this way will be equivalent to sample from a <math>\displaystyle N(a, b^2)</math>distribution. We can modify the MatLab code used in the last section to demonstrate this. We just need to add one line before we generate the histogram: | The values for v generated in this way will be equivalent to sample from a <math>\displaystyle N(a, b^2)</math>distribution. We can modify the MatLab code used in the last section to demonstrate this. We just need to add one line before we generate the histogram: | ||
− | <pre> | + | <pre style='font-size:16px'> |
v = a + b * x; | v = a + b * x; | ||
</pre> | </pre> | ||
Line 2,032: | Line 2,261: | ||
The following MatLab code provides an example, where a scatter plot of 10000 random points is generated. In this case x and y have a co-variance of 0.9 - a very strong positive correlation. | The following MatLab code provides an example, where a scatter plot of 10000 random points is generated. In this case x and y have a co-variance of 0.9 - a very strong positive correlation. | ||
− | <pre> | + | <pre style='font-size:16px'> |
x = zeros(10000, 1); | x = zeros(10000, 1); | ||
y = zeros(10000, 1); | y = zeros(10000, 1); | ||
Line 2,046: | Line 2,275: | ||
E = [1, 0.9; 0.9, 1]; | E = [1, 0.9; 0.9, 1]; | ||
[u s v] = svd(E); | [u s v] = svd(E); | ||
− | root_E = u * (s ^ (1 / 2)); | + | root_E = u * (s ^ (1 / 2)) * u'; |
z = (root_E * [x y]'); | z = (root_E * [x y]'); | ||
Line 2,066: | Line 2,295: | ||
Here is an example: | Here is an example: | ||
− | <pre> | + | <pre style='font-size:16px'> |
E = [1, 0.9; 0.9, 1]; | E = [1, 0.9; 0.9, 1]; | ||
r1 = sqrtm(E); | r1 = sqrtm(E); | ||
Line 2,074: | Line 2,303: | ||
R code for a multivariate normal distribution: | R code for a multivariate normal distribution: | ||
− | <pre> | + | <pre style='font-size:16px'> |
n=10000; | n=10000; | ||
r2<--2*log(runif(n)); | r2<--2*log(runif(n)); | ||
Line 2,095: | Line 2,324: | ||
=== Bernoulli Distribution === | === Bernoulli Distribution === | ||
− | The Bernoulli distribution is a discrete probability distribution, which usually | + | The Bernoulli distribution is a discrete probability distribution, which usually describes an event that only has two possible results, i.e. success or failure (x=0 or 1). If the event succeed, we usually take value 1 with success probability p, and take value 0 with failure probability q = 1 - p. |
P ( x = 0) = q = 1 - p <br /> | P ( x = 0) = q = 1 - p <br /> | ||
Line 2,104: | Line 2,333: | ||
<br> P is the success probability. | <br> P is the success probability. | ||
− | The Bernoulli distribution is a special case of binomial distribution, | + | The Bernoulli distribution is a special case of binomial distribution, where the variate x only has two outcomes; so that the Bernoulli also can use the probability density function of the binomial distribution with the variate x taking values 0 and 1. |
− | Let x1, | + | The most famous example for the Bernoulli Distribution would be the "Flip Coin" question, which has only two possible outcomes(Success or Failure) with the same probabilities of 0.5 |
+ | |||
+ | Let x1,x2 denote the lifetime of 2 independent particles, x1~exp(<math>\lambda</math>), x2~exp(<math>\lambda</math>) | ||
we are interested in y=min(x1,x2) | we are interested in y=min(x1,x2) | ||
Line 2,153: | Line 2,384: | ||
<math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /> | <math>U = \sum_{i=1}^{n} X_i \sim Binomial(n,p)</math><br /> | ||
So we can sample from binomial distribution using this property. | So we can sample from binomial distribution using this property. | ||
− | Note: | + | Note: We can consider Binomial distribution as the sum of n, ''independent'', Bernoulli distributions |
− | + | <div style="background:#CCFF33;border-radius:5px;box-shadow: 10px 10px 5px #888888;padding:30px;"> | |
− | + | * '''Code to Generate Binomial(n = 20,p = 0.7)'''<br /> | |
− | * '''Code to Generate Binomial(n = | ||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
− | p = 0. | + | p = 0.7; |
− | n = | + | n = 20; |
for k=1:5000 | for k=1:5000 | ||
i = 1; | i = 1; | ||
− | + | for i=1:n | |
u=rand(); | u=rand(); | ||
if (u <= p) | if (u <= p) | ||
Line 2,170: | Line 2,400: | ||
y(i) = 0; | y(i) = 0; | ||
end | end | ||
− | |||
end | end | ||
Line 2,179: | Line 2,408: | ||
</pre> | </pre> | ||
− | |||
− | Comments on Matlab: | + | |
+ | |||
+ | |||
+ | </div> | ||
+ | Note: We can also regard the Bernoulli Distribution as either a conditional distribution or <math>f(x)= p^{x}(1-p)^{(1-x)}</math>, x=0,1. | ||
+ | |||
+ | Comments on Matlab: | ||
When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. | When doing operations on vectors, always put a dot before the operator if you want the operation to be done to every element in the vector. | ||
example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. | example: Let V be a vector with dimension 2*4 and you want each element multiply by 3. | ||
Line 2,190: | Line 2,424: | ||
== Class 7 - Tuesday, May 28 == | == Class 7 - Tuesday, May 28 == | ||
− | |||
+ | [[Note that the material in this lecture will not be on the exam; it was only to supplement what we have learned.]] | ||
===Universality of the Uniform Distribution/Inverse Method=== | ===Universality of the Uniform Distribution/Inverse Method=== | ||
Line 2,198: | Line 2,432: | ||
Procedure: | Procedure: | ||
− | 1 | + | 1) Generate U~Unif (0, 1)<br> |
− | 2 | + | 2) Set <math>x=F^{-1}(u)</math><br> |
− | 3 | + | 3) X~f(x)<br> |
+ | |||
+ | '''Remark'''<br> | ||
+ | 1) The preceding can be written algorithmically for discrete random variables as <br> | ||
+ | Generate a random number U ~ U(0,1] <br> | ||
+ | If U < p<sub>0</sub> set X = x<sub>0</sub> and stop <br> | ||
+ | If U < p<sub>0</sub> + p<sub>1</sub> set X = x<sub>1</sub> and stop <br> | ||
+ | ... <br> | ||
+ | 2) If the x<sub>i</sub>, i>=0, are ordered so that x<sub>0</sub> < x<sub>1</sub> < x<sub>2</sub> <... and if we let F denote the distribution function of X, then X will equal x<sub>j</sub> if F(x<sub>j-1</sub>) <= U < F(x<sub>j</sub>) | ||
'''Example 1'''<br> | '''Example 1'''<br> | ||
Line 2,213: | Line 2,455: | ||
'''Solution:'''<br> | '''Solution:'''<br> | ||
− | x~exp(<math>\ | + | x<sub>1</sub>~exp(<math>\lambda_1</math>)<br> |
− | + | x<sub>2</sub>~exp(<math>\lambda_2</math>)<br> | |
− | <math>f_{x | + | <math>f_{x(x)}=\lambda e^{-\lambda x},x\geq0 </math> <br> |
+ | <math>F_X(x)=1-e^{-\lambda x}, x\geq 0</math><br> | ||
<math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br> | <math>1-F_Y(y) = P(Y>y)</math> = P(min(X<sub>1</sub>,X<sub>2</sub>) > y) = <math>\, P((X_1)>y) P((X_2)>y) = e^{\, -(\lambda_1 + \lambda_2) y}</math><br> | ||
Line 2,226: | Line 2,469: | ||
Step1: Generate U~ U(0, 1)<br> | Step1: Generate U~ U(0, 1)<br> | ||
− | Step2: set <math> | + | |
+ | Step2: set <math>y=\, {-\frac {1}{{\lambda_1 +\lambda_2}}} ln(1-u)</math><br> | ||
+ | |||
+ | or set <math>y=\, {-\frac {1} {{\lambda_1 +\lambda_2}}} ln(u)</math><br> | ||
+ | Since it is a uniform distribution, therefore after generate a lot of times 1-u and u are the same. | ||
+ | |||
+ | |||
+ | * '''Matlab Code'''<br /> | ||
+ | <pre style="font-size:16px"> | ||
+ | >> lambda1 = 1; | ||
+ | >> lambda2 = 2; | ||
+ | >> u = rand; | ||
+ | >> y = -log(u)/(lambda1 + lambda2) | ||
+ | </pre> | ||
If we generalize this example from two independent particles to n independent particles we will have:<br> | If we generalize this example from two independent particles to n independent particles we will have:<br> | ||
Line 2,262: | Line 2,518: | ||
'''Solution:'''<br> | '''Solution:'''<br> | ||
<br> | <br> | ||
− | 1. | + | 1. Generate <math>U ~\sim~ Unif[0, 1)</math><br> |
− | 2. Set | + | 2. Set <math>X = U^{1/n}</math><br> |
<br> | <br> | ||
− | For example, when n = 20,<br> | + | For example, when <math>n = 20</math>,<br> |
− | + | <math>U = 0.6</math> => <math>X = U^{1/20} = 0.974</math><br> | |
− | + | <math>U = 0.5 =></math> <math>X = U^{1/20} = 0.966</math><br> | |
− | + | <math>U = 0.2</math> => <math>X = U^{1/20} = 0.923</math><br> | |
<br> | <br> | ||
− | Observe from above that the values of X for n = 20 are close to 1, this is because we can view | + | Observe from above that the values of X for n = 20 are close to 1, this is because we can view <math>X^n</math> as the maximum of n independent random variables <math>X,</math> <math>X~\sim~Unif(0,1)</math> and is much likely to be close to 1 as n increases. This is because when n is large the exponent tends towards 0. This observation is the motivation for method 2 below.<br> |
Recall that | Recall that | ||
Line 2,277: | Line 2,533: | ||
Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> | Similarly if <math> Y = min(X_1,\ldots,X_n)</math> then the cdf of <math>Y</math> is <math>F_Y = 1- </math><math>\prod</math><math>(1- F_{X_i})</math><br> | ||
<br> | <br> | ||
− | Method 1: Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = | + | '''Method 1:''' Following the above result we can see that in this example, F<sub>X</sub> = x<sup>n</sup> is the cumulative distribution function of the max of n uniform random variables between 0 and 1 (since for U~Unif(0, 1), F<sub>U</sub>(x) = <br> |
− | Method 2: generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method. | + | '''Method 2:''' generate X by having a sample of n independent U~Unif(0, 1) and take the max of the n samples to be x. However, the solution given above using inverse-transform method only requires generating one uniform random number instead of n of them, so it is a more efficient method. |
<br> | <br> | ||
− | + | Generate the Y = max (X1, X2, ... , Xn), Y = min (X1, X2, ... , Xn), pdf and cdf, but (xi and xj are independent) i,j=1,2,3,4,5..... | |
'''Example 4 (New)'''<br> | '''Example 4 (New)'''<br> | ||
Line 2,321: | Line 2,577: | ||
The general algorithm to generate random variables from a composition CDF is: | The general algorithm to generate random variables from a composition CDF is: | ||
− | 1) Generate U, V ~ <math> | + | 1) Generate U,V ~ <math> Unif(0,1)</math> |
− | 2) If | + | 2) If U < p<sub>1</sub>, V = <math>F_{X_{1}}(x)</math><sup>-1</sup> |
− | 3) Else if | + | 3) Else if U < p<sub>1</sub> + p<sub>2</sub>, V = <math>F_{X_{2}}(x)</math><sup>-1</sup> |
− | 4) | + | 4) Repeat from Step 1 (if N randomly generated variables needed, repeat N times) |
<b>Explanation</b><br> | <b>Explanation</b><br> | ||
− | Each random variable that is a part of X contributes <math>p_{i} | + | Each random variable that is a part of X contributes <math>p_{i} F_{X_{i}}(x)</math> to <math>F_{X}(x)</math> every time. |
From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from. | From a sampling point of view, that is equivalent to contributing <math>F_{X_{i}}(x)</math> <math>p_{i}</math> of the time. The logic of this is similar to that of the Accept-Reject Method, but instead of rejecting a value depending on the value u takes, we instead decide which distribution to sample it from. | ||
+ | |||
+ | |||
+ | <b> Simplified Version </b><br> | ||
+ | 1) Generate <math>u \sim Unif(0,1)</math> <br> | ||
+ | 2) Set <math> X=0, s=P_0</math><br> | ||
+ | 3) While <math> u > s, </math><br> | ||
+ | set <math> X = X+1</math> and <math> s=s+P_x </math> <br> | ||
+ | 4) Return <math> X </math> | ||
=== Examples of Decomposition Method === | === Examples of Decomposition Method === | ||
<b>Example 1</b> <br> | <b>Example 1</b> <br> | ||
− | f(x) = 5 | + | <math>f(x) = \frac{5}{12}(1+(x-1)^4) 0\leq x\leq 2</math> <br> |
− | f(x) = 5 | + | <math>f(x) = \frac{5}{12}+\frac{5}{12}(x-1)^4 = \frac{5}{6} (\frac{1}{2})+\frac {1}{6}(\frac{5}{2})(x-1))^4</math> <br> |
− | Let | + | Let<math>f_{x_1}= \frac{1}{2}</math> and <math>f_{x_2} = \frac {5}{2}(x-1)^4</math> <br> |
Algorithm: | Algorithm: | ||
Generate U~Unif(0,1) <br> | Generate U~Unif(0,1) <br> | ||
− | If 0<u<5/ | + | If <math>0<u<\frac {5}{6}</math>, then we sample from f<sub>x1</sub> <br> |
− | Else if 5 | + | Else if <math>\frac{5}{6}<u<1</math>, we sample from f<sub>x2</sub> <br> |
We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br> | We can find the inverse CDF of f<sub>x2</sub> and utilize the Inverse Transform Method in order to sample from f<sub>x2</sub> <br> | ||
Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br> | Sampling from f<sub>x1</sub> is more straightforward since it is uniform over the interval (0,2) <br> | ||
Line 2,349: | Line 2,613: | ||
<b>Example 2</b> <br> | <b>Example 2</b> <br> | ||
− | <math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12} | + | <math>f(x)=\frac{1}{4}e^{-x}+2x+\frac{1}{12}, \quad 0\leq x \leq 3 </math> <br> |
− | We can rewrite f(x) as <math>f(x)=(\frac{1}{4}) | + | We can rewrite f(x) as <math>f(x)=(\frac{1}{4}) e^{-x}+(\frac{2}{4}) 4x+(\frac{1}{4}) \frac{1}{3}</math> <br> |
Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br> | Let f<sub>x1</sub> = <math>e^{-x}</math>, f<sub>x2</sub> = 4x, and f<sub>x3</sub> = <math>\frac{1}{3}</math> <br> | ||
Generate U~Unif(0,1)<br> | Generate U~Unif(0,1)<br> | ||
Line 2,363: | Line 2,627: | ||
In general, to write an <b>efficient </b> algorithm for: <br> | In general, to write an <b>efficient </b> algorithm for: <br> | ||
<math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br> | <math>F_{X}(x) = p_{1}F_{X_{1}}(x) + p_{2}F_{X_{2}}(x) + ... + p_{n}F_{X_{n}}(x)</math> <br> | ||
− | We would first | + | We would first calculate <math> {q_i} = \sum_{j=1}^i p_j, \forall i = 1,\dots, n</math> |
− | Then Generate <math> U | + | Then Generate <math> U \sim~ Unif(0,1) </math> <br> |
− | If <math> | + | If <math> U < q_1 </math> sample from <math> f_1 </math> <br> |
− | else if <math> u< | + | else if <math> u<q_i </math> sample from <math> f_i </math> for <math> 1 < i < n </math><br> |
else sample from <math> f_n </math> <br> | else sample from <math> f_n </math> <br> | ||
when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1) | when we divided the pdf of different range of f(x1) f(x2) and f(x3), and generate all of them and inverse, U~U(0,1) | ||
+ | <div style="background:#66CCFF;padding:20px;border-radius:5px;"> | ||
=== Example of Decomposition Method === | === Example of Decomposition Method === | ||
− | + | <math>F_x(x) = \frac {1}{3} x+\frac {1}{3} x^2+\frac {1}{3} x^3, 0\leq x\leq 1</math> | |
− | + | Let <math>U =F_x(x) = \frac {1}{3} x+\frac {1}{3} x^2+\frac {1}{3} x^3</math>, solve for x. | |
− | + | <math>P_1=\frac{1}{3}, F_{x1} (x)= x, P_2=\frac{1}{3},F_{x2} (x)= x^2, | |
− | + | P_3=\frac{1}{3},F_{x3} (x)= x^3</math> | |
'''Algorithm:''' | '''Algorithm:''' | ||
− | Generate U | + | Generate <math>\,U \sim Unif [0,1)</math> |
− | Generate V | + | Generate <math>\,V \sim Unif [0,1)</math> |
− | if 0 | + | if <math>0\leq u \leq \frac{1}{3}, x = v</math> |
− | else if u | + | else if <math>u \leq \frac{2}{3}, x = v^{\frac{1}{2}}</math> |
− | else x = v | + | else <math>x=v^{\frac{1}{3}}</math> <br> |
'''Matlab Code:''' | '''Matlab Code:''' | ||
<pre style="font-size:16px"> | <pre style="font-size:16px"> | ||
− | u=rand | + | u=rand # U is |
v=rand | v=rand | ||
if u<1/3 | if u<1/3 | ||
Line 2,405: | Line 2,670: | ||
end | end | ||
</pre> | </pre> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <span style="margin:0 auto;">=== Example of Decomposition Method(new) ===</span> | |
− | === | + | F<sub>x</sub>(x) = 1/2*x+1/2*x<sup>2</sup>, 0<= x<=1 |
− | + | let U =F<sub>x</sub>(x) = 1/2*x+1/2*x<sup>2</sup>, solve for x. | |
− | + | P<sub>1</sub>=1/2, F<sub>x1</sub>(x)= x, P<sub>2</sub>=1/2,F<sub>x2</sub>(x)= x<sup>2</sup>, | |
− | + | '''Algorithm:''' | |
− | + | Generate U ~ Unif [0,1) | |
− | |||
− | |||
− | |||
− | + | Generate V~ Unif [0,1) | |
− | + | if 0<u<1/2, x = v | |
− | + | else x = v<sup>1/2</sup> | |
− | |||
− | |||
+ | '''Matlab Code:''' | ||
+ | <pre style="font-size:16px"> | ||
+ | u=rand | ||
+ | v=rand | ||
+ | if u<1/2 | ||
+ | x=v | ||
+ | else | ||
+ | x=sqrt(v) | ||
+ | end | ||
+ | </pre> | ||
+ | </div> | ||
− | + | '''Extra Knowledge about Decomposition Method''' | |
− | |||
− | |||
− | |||
+ | There are different types and applications of Decomposition Method | ||
− | + | 1. Primal decomposition | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 2. Dual decomposition | |
− | |||
− | + | 3. Decomposition with constraints | |
− | |||
− | + | 4. More general decomposition structures | |
− | |||
− | |||
− | |||
− | + | 5. Rate control | |
− | + | 6. Single commodity network ﬂow | |
− | + | For More Details, please refer to http://www.stanford.edu/class/ee364b/notes/decomposition_notes.pdf | |
− | + | ===Fundamental Theorem of Simulation=== | |
− | + | Consider two shapes, A and B, where B is a sub-shape (subset) of A. | |
+ | We want to sample uniformly from inside the shape B. | ||
+ | Then we can sample uniformly inside of A, and throw away all samples outside of B, and this will leave us with a uniform sample from within B. | ||
+ | (Basis of the Accept-Reject algorithm) | ||
− | + | The advantage of this method is that we can sample a unknown distribution from a easy distribution. The disadvantage of this method is that it may need to reject many points, which is inefficient.<br /> | |
+ | Inverse each part of partial CDF, the partial CDF is divided by the original CDF, partial range is uniform distribution.<br /> | ||
+ | More specific definition of the theorem can be found here.<ref>http://www.bus.emory.edu/breno/teaching/MCMC_GibbsHandouts.pdf</ref> | ||
− | + | Matlab code: | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | === | + | <pre style="font-size:16px"> |
+ | close all | ||
+ | clear all | ||
+ | ii=1; | ||
+ | while ii<1000 | ||
+ | u=rand | ||
+ | y=R*(2*U-1) | ||
+ | if (1-U^2)>=(2*u-1)^2 | ||
+ | x(ii)=y; | ||
+ | ii=ii+1 | ||
+ | end | ||
+ | </pre> | ||
− | + | ===Question 2=== | |
− | + | Use Acceptance and Rejection Method to sample from <math>f_X(x)=b*x^n*(1-x)^n</math> , <math>n>0</math>, <math>0<x<1</math> | |
− | |||
− | + | Solution: | |
− | + | This is a beta distribution, Beta ~<math>\int _{0}^{1}b*x^{n}*(1-x)^{n}dx = 1</math> | |
− | + | U<sub>1~Unif[0,1) | |
− | |||
− | |||
− | |||
− | + | U<sub>2~Unif[0,1) | |
− | |||
− | |||
− | |||
− | |||
− | > | + | fx=<math> bx^{1/2}(1-x)^{1/2} <= bx^{-1/2}\sqrt2 ,0<=x<=1/2 </math> |
− | |||
− | |||
− | > | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The beta distribution maximized at 0.5 with value <math>(1/4)^n</math>. | |
+ | So, <math>c=b*(1/4)^n</math><br /> | ||
+ | Algorithm: <br /> | ||
+ | 1.Draw <math>U_1</math> from <math>U(0, 1)</math>. <math> U_2</math> from <math>U(0, 1)</math> <br /> | ||
+ | 2.If <math>U_2<=b*(U_1)^n*(1-(U_1))^n/b*(1/4)^n=(4*(U_1)*(1-(U_1)))^n</math><br /> | ||
+ | then X=U_1 | ||
+ | Else return to step 1. | ||
− | + | Discrete Case: | |
− | + | Most discrete random variables do not have a closed form inverse CDF. Also, its CDF <math>F:X \rightarrow [0,1]</math> is not necessarily onto. This means that not every point in the interval <math> [0,1] </math> has a preimage in the support set of X through the CDF function.<br /> | |
− | + | Let <math>X</math> be a discrete random variable where <math>a \leq X \leq b</math> and <math>a,b \in \mathbb{Z}</math> . <br> | |
− | + | To sample from <math>X</math>, we use the partition method below: <br> | |
− | + | <math>\, \text{Step 1: Generate u from } U \sim Unif[0,1]</math><br> | |
− | + | <math>\, \text{Step 2: Set } x=a, s=P(X=a)</math><br /> | |
− | + | <math>\, \text{Step 3: While } u>s, x=x+1, s=s+P(X=x)</math> <br /> | |
− | + | <math>\, \text{Step 4: Return } x</math><br /> | |
− | + | ==Class 8 - Thursday, May 30, 2013== | |
− | + | In this lecture, we will discuss algorithms to generate 3 well-known distributions: Binomial, Geometric and Poisson. For each of these distributions, we will first state its general understanding, probability mass function, expectation and variance. Then, we will derive one or more algorithms to sample from each of these distributions, and implement the algorithms on Matlab. <br \> | |
− | |||
+ | ===The Bernoulli distribution=== | ||
+ | The Bernoulli distribution is a special case of the binomial distribution, where n = 1. X ~ Bin(1, p) has the same meaning as X ~ Ber(p), where p is the probability of success and 1-p is the probability of failure (we usually define a variate q, q= 1-p). The mean of Bernoulli is p and the variance is p(1-p). Bin(n, p), is the distribution of the sum of n independent Bernoulli trials, Bernoulli(p), each with the same probability p, where 0<p<1. <br> | ||
+ | For example, let X be the event that a coin toss results in a "head" with probability ''p'', then ''X~Bernoulli(p)''. <br> | ||
+ | P(X=1)= p | ||
+ | P(X=0)= q = 1-p | ||
+ | Therefore, P(X=0) + P(X=1) = p + q = 1 | ||
− | + | '''Algorithm: ''' | |
+ | 1) Generate <math>u\sim~Unif(0,1)</math> <br> | ||
+ | 2) If <math>u \leq p</math>, then <math>x = 1 </math><br> | ||
+ | else <math>x = 0</math> <br> | ||
+ | The answer is: <br> | ||
+ | when <math> U \leq p, x=1</math> <br> | ||
+ | when <math>U \geq p, x=0</math><br> | ||
+ | 3) Repeat as necessary | ||
− | + | * '''Matlab Code'''<br /> | |
+ | <pre style="font-size:16px"> | ||
+ | >> p = 0.8 % an arbitrary probability for example | ||
+ | >> for i = 1: 100 | ||
+ | >> u = rand; | ||
+ | >> if u < p | ||
+ | >> x(ii) = 1; | ||
+ | >> else | ||
+ | >> x(ii) = 0; | ||
+ | >> end | ||
+ | >> end | ||
+ | >> hist(x) | ||
+ | </pre> | ||
− | + | ===The Binomial Distribution=== | |
− | + | In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ Bin(n, p). | |
+ | (Acknowledge: https://en.wikipedia.org/wiki/Binomial_distribution) | ||
+ | If X ~ B(n, p), then its pmf is of form: | ||
+ | f(x)=(nCx) p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n<br /> | ||
+ | Or f(x) = <math>(n!/x!(n-x)!)</math> p<sup>x</sup>(1-p)<sup>(n-x)</sup>, x=0,1,...n <br /> | ||
− | < | + | Mean (x) = E(x) = <math> np </math> |
+ | Variance = <math> np(1-p) </math><br/> | ||
− | + | Generate n uniform random number <math>U_1,...,U_n</math> and let X be the number of <math>U_i</math> that are less than or equal to p. | |
− | + | The logic behind this algorithm is that the Binomial Distribution is simply a Bernoulli Trial, with a probability of success of p, repeated n times. Thus, we can sample from the distribution by sampling from n Bernoulli. The sum of these n bernoulli trials will represent one binomial sampling. Thus, in the below example, we are sampling 1000 realizations from 20 Bernoulli random variables. By summing up the rows of the 20 by 1000 matrix that is produced, we are summing up the 20 bernoulli outcomes to produce one binomial sampling. We have 1000 rows, which means we have realizations from 1000 binomial random variables when this sum is done (the output of the sum is a 1 by 1000 sized vector).<br /> | |
+ | To continue with the previous example, let X be the number of heads in a series of ''n'' independent coin tosses - where for each toss, the probability of coming up with a head is ''p'' - then ''X~Bin(n, p)''. <br /> | ||
+ | MATLAB tips: to get a pdf f(x), we can use code binornd(N,P). N means number of trials and p is the probability of success. a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. If a=[2 6 9 10], if set a<4, will produce a=[1 0 0 0], because only the first element (2) is less than 4, meanwhile the rest are greater. So we can use this to get the number which is less than p.<br /> | ||
− | + | Algorithm for Bernoulli is given as above | |
− | < | + | '''Code'''<br> |
+ | <pre style="font-size:16px"> | ||
+ | >>a=[3 5 8]; | ||
+ | >>a<5 | ||
+ | ans= 1 0 0 | ||
− | + | >>rand(20,1000) | |
+ | >>rand(20,1000)<0.4 | ||
+ | >>A = sum(rand(20,1000)<0.4) #sum of raws ~ Bin(20 , 0.3) | ||
+ | >>hist(A) | ||
+ | >>mean(A) | ||
+ | Note: `1` in the above code means sum the matrix by column | ||
− | + | >>sum(sum(rand(20,1000)<0.4)>8)/1000 | |
+ | This is an estimate of Pr[A>8]. | ||
− | + | </pre> | |
− | + | [[File:Binomial_example.jpg|300px]] | |
− | + | remark: a=[2 3 4],if set a<3, will produce a=[1 0 0]. If you set "a == 3", it will produce [0 1 0]. | |
+ | using code to find some value what i want to get from the matrix. It`s useful to define some matrixs. | ||
− | + | Relation between Bernoulli Distribution and Binomial Distribution: | |
+ | For instance, we want to find numbers ≤0.3. Uniform collects which is ≤0.3, and Binomial calculates how many numbers are there ≤0.3. | ||
+ | ===The Geometric Distribution=== | ||
+ | Geometric distribution is a discrete distribution. There are two types geometric distributions, the first one is the probability distribution of the number of X Bernoulli fail trials, with probability 1-p, needed until the first success situation happened, X come from the set { 1, 2, 3, ...}; the other one is the probability distribution of the number Y = X − 1 of failures, with probability 1-p, before the first success, Y comes from the set { 0, 1, 2, 3, ... }. | ||
− | If | + | For example,<br /> |
− | + | If the success event showed at the first time, which x=1, then f(x)=p.<br /> | |
+ | If the success event showed at the second time and failed at the first time, which x = 2, then f(x)= p(1-p).<br /> | ||
+ | If the success event showed at the third time and failed at the first and second time, which x = 3, then f(x)= p(1-p)<sup>2 </sup>. etc.<br /> | ||
+ | If the success event showed at the k time and all failed before time k, which implies x = k, then f(k)= p(1-p)<sup>(k-1)</sup><br /> | ||
+ | which is,<br /> | ||
+ | x Pr<br /> | ||
+ | 1 P<br /> | ||
+ | 2 P(1-P)<br /> | ||
+ | 3 P(1-P)<sup>2</sup><br /> | ||
+ | . .<br /> | ||
+ | . .<br /> | ||
+ | . .<br /> | ||
+ | n P(1-P)<sup>(n-1)</sup><br /> | ||
+ | Also, the sequence of the outputs of the probability is a geometric sequence. | ||
− | + | For example, suppose a die is thrown repeatedly until the first time a "6" appears. This is a question of geometric distribution of the number of times on the set { 1, 2, 3, ... } with p = 1/6. | |
− | ''' | + | Generally speaking, if X~G(p) then its pdf is of the form f(x)=(1-p)<sup>(x-1)</sup>*p, x=1,2,...<br /> |
+ | The random variable X is the number of trials required until the first success in a series of independent''' Bernoulli trials'''.<br /> | ||
− | |||
− | |||
− | + | Other properties | |
− | |||
− | + | Probability mass function : P(X=k) = p(1-p)<sup>(k-1)</sup> | |
− | + | Tail probability : P(X>n) = <math>(1-p)^n</math> | |
− | |||
− | |||
− | + | The CDF : P(X<n) = 1 - <math>(1-p)^n</math> | |
− | <math> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <span style="background:#F5F5DC"> | |
− | |||
+ | Mean of x = 1/p | ||
+ | Var(x) = (1-p)/p^2 | ||
+ | There are two ways to look at a geometric distribution. | ||
+ | <b>1st Method</b> | ||
+ | We look at the number of trials before the first success. This includes the last trial in which you succeeded. This will be used in our course. | ||
− | + | pdf is of form f(x)=>(1-p)<sup>(x-1)</sup>*(p), x = 1, 2, 3, ... | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | < | + | <b>2nd Method</b> |
− | + | This involves modeling the failure before the first success. This does not include the last trial in which we succeeded. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | pdf is of form f(x)=> ((1-p)^x)*p , x = 0, 1, 2, .... | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | </span> | |
− | |||
− | |||
− | |||
− | + | If Y~Exp(<math>\lambda</math>) then <math>X=\left \lfloor Y \right \rfloor+1</math> is geometric.<br /> | |
+ | Choose e^(-<math>\lambda</math>)=1-p. Then X ~ geo (p) <br /> | ||
− | + | P (X > x) = (1-p)<sup>x</sup>(because first x trials are not successful) <br/> | |
+ | |||
+ | NB: An advantage of using this method is that nothing is rejected. We accept all the points, and the method is more efficient. Also, this method is closer to the inverse transform method as nothing is being rejected. <br /> | ||
− | + | '''Proof''' <br/> | |
− | |||
− | + | <math>P(X>x) = P( \left \lfloor Y \right \rfloor + 1 > X) = P(\left \lfloor Y \right \rfloor > x- 1) = P(Y>= x) = e^{-\lambda × x} </math> <br> | |
− | + | SInce p = 1- e<sup>-<math>\lambda</math></sup> or <math>\lambda</math>= <math>-log(1-p)</math>(compare the pdf of exponential distribution and Geometric distribution,we can look at e<sup>-<math>\lambda</math></sup> the probability of the fail trial), then <br> | |
− | |||
− | + | P(X>x) = e<sup>(-<math>\lambda</math> * x)</sup> = e<sup>log(1-p)*x</sup> = (1-p)<sup>x</sup> <br/> | |
− | <math> | ||
− | < | ||
− | + | Note that floor(Y)>X -> Y >= X+1 (X is an integer) <br/> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | proof how to use EXP distribution to find P(X>x)=(1-p)^x | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <br> | |
− | + | Suppose X has the exponential distribution with rate parameter <math> \lambda > 0 </math> <br> | |
− | + | the <math>\left \lfloor X \right \rfloor </math> and <math>\left \lceil X \right \rceil </math> have geometric distribution on <math> \mathcal{N} </math> and <math> \mathcal{N}_{+} </math> respectively each with success probability <math> 1-e^ {- \lambda} </math> <br> | |
− | |||
− | |||
− | + | Proof: <br> | |
+ | <math>\text{For } n \in \mathcal{N} </math><br//> | ||
− | + | <math>\begin{align} | |
− | < | + | P(\left \lfloor X \right \rfloor = n)&{}= P( n \leq X < n+1) \\ |
− | + | &{}= F( n+1) - F(n) \\ | |
− | + | \text{By algebra and simplification:} \\ | |
− | + | P(\left \lfloor X \right \rfloor = n)&{}= (e^ {-\lambda})^n \cdot (1 - e^ {-\lambda}) \\ | |
− | + | &{}= Geo (1 - e^ {-\lambda}) \\ | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | \text{Proof of ceiling part follows immediately.} \\ | |
+ | \end{align}</math> <br//> | ||
− | |||
− | |||
− | |||
− | |||
− | + | '''Algorithm:''' <br /> | |
+ | 1) Let <math>\lambda = -\log (1-p) </math><br /> | ||
+ | 2) Generate a <math>Y \sim Exp(\lambda )</math> <br /> | ||
+ | 3) We can then let <math>X = \left \lfloor Y \right \rfloor + 1, where X\sim Geo(p)</math> <br /> | ||
+ | note: <math>\left \lfloor Y \right \rfloor >2 -> Y>=3</math><br /> | ||
+ | <math> \left \lfloor Y \right \rfloor >5 -> Y>=6</math><br /> | ||
+ | <br /> | ||
− | + | <math>\left \lfloor Y \right \rfloor>x </math> -> Y>= X+1 <br /> | |
− | <math>P(X>x) = P(Y>=x)</math> (from the class notes) | + | <math>P(Y>=X)</math><br /> |
+ | Y ~ Exp (<math>\lambda</math>)<br /> | ||
+ | pdf of Y : <math>\lambda e^{-\lambda}</math><br /> | ||
+ | cdf of Y : <math>1- e^{-\lambda}</math><br /> | ||
+ | cdf <math>P(Y<x)=1-e^{-\lambda x}</math><br /> | ||
+ | <math>P(Y>=x)=1-(1- e^{-\lambda x})=e^{-\lambda x}</math><br /> | ||
+ | <math> e^{-\lambda}=1-p -> -log(1-p)=\lambda</math><br /> | ||
+ | <math>P(Y>=x)=e^{-\lambda x}=e^{log(1-p)x}=(1-p)^x</math><br /> | ||
+ | <math>E[x]=1/P </math><br /> | ||
+ | <math>Var= (1-P)/(P^2)</math><br /> | ||
+ | P(X>x)<br /> | ||
+ | =P(floor(y)+1>x)<br /> | ||
+ | =P(floor(y)>x-1)<br /> | ||
+ | =P(y>=x) | ||
+ | |||
+ | use <math>e^{-\lambda}=1-p</math> to figure out the mean and variance. | ||
+ | '''Code'''<br> | ||
+ | <pre style="font-size:16px"> | ||
+ | >>p=0.4; | ||
+ | >>l=-log(1-p); | ||
+ | >>u=rand(1,1000); | ||
+ | >>y=(-1/l)*log(u); | ||
+ | >>x=floor(y)+1; | ||
+ | >>hist(x) | ||
+ | |||
+ | ===Note:=== | ||
+ | mean(x)~E[X]=> 1/p | ||
+ | Var(x)~V[X]=> (1-p)/p^2 | ||
+ | |||
+ | A specific Example: | ||
+ | Consider x=5 | ||
+ | >> sum(x==5)/1000 -> chance that will succeed at fifth trial; | ||
+ | >> ans = | ||
+ | 0.0780 | ||
+ | >> sum(x>10)/1000 -> chance that will succeed after 10 trials | ||
+ | >> ans = | ||
+ | 0.0320 | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | Note that the above mean is the average amount of times you should try until you get a successful case.<br/> | ||
+ | |||
+ | [[File:Geometric_example.jpg|300px]] | ||
+ | |||
+ | <span style="background:#F5F5DC"> | ||
+ | EXAMPLE for geometric distribution: Consider the case of rolling a die: </span> | ||
+ | |||
+ | X=the number of rolls that it takes for the number 5 to appear. | ||
+ | |||
+ | We have X ~Geo(1/6), <math>f(x)=(1/6)*(5/6)^{x-1}</math>, x=1,2,3.... | ||
+ | |||
+ | Now, let <math>\left \lfloor Y \right \rfloor=e^{\lambda}</math> => x=floor(Y) +1 | ||
+ | |||
+ | Let <math>e^{-\lambda}=5/6</math> | ||
+ | |||
+ | <math>P(X>x) = P(Y>=x)</math> (from the class notes) | ||
We have <math>e^{-\lambda *x} = (5/6)^x</math> | We have <math>e^{-\lambda *x} = (5/6)^x</math> | ||
Line 2,728: | Line 3,033: | ||
1) Let Y be <math>e^{\lambda}</math>, exponentially distributed | 1) Let Y be <math>e^{\lambda}</math>, exponentially distributed | ||
− | 2) Set X= | + | 2) Set <math>X= \left \lfloor Y \right \rfloor +1 </math>, to generate X |
<math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math> | <math> E[x]=6, Var[X]=5/6 /(1/6^2) = 30 </math> | ||
Line 2,746: | Line 3,051: | ||
remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x. | remark the step 1 and step 2. Looking for the floor Y, and e^(-mu)=1-p=5/6, and then generate x. | ||
− | === | + | ===Poisson Distribution=== |
− | + | If <math>\displaystyle X \sim \text{Poi}(\lambda)</math>, its pdf is of the form <math>\displaystyle \, f(x) = \frac{e^{-\lambda}\lambda^x}{x!}</math> , where <math>\displaystyle \lambda </math> is the rate parameter.<br /> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | definition:In probability theory and statistics, the Poisson distribution (pronounced [pwasɔ̃]) is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. | ||
+ | For instance, suppose someone typically gets 4 pieces of mail per day on average. There will be, however, a certain spread: sometimes a little more, sometimes a little less, once in a while nothing at all.[2] Given only the average rate, for a certain period of observation (pieces of mail per day, phonecalls per hour, etc.), and assuming that the process, or mix of processes, that produces the event flow is essentially random, the Poisson distribution specifies how likely it is that the count will be 3, or 5, or 10, or any other number, during one period of observation. That is, it predicts the degree of spread around a known average rate of occurrence. | ||
+ | The Derivation of the Poisson distribution section shows the relation with a formal definition.(from Wikipedia) | ||
− | + | Understanding of Poisson distribution: | |
− | + | If customers '''independently''' come to bank over time, all following exponential distributions with rate <math>\lambda</math> per unit of time, then | |
+ | X(t) = # of customer in [0,t] ~ Poi<math>(\lambda t)</math> | ||
− | + | Its mean and variance are<br /> | |
− | + | <math>\displaystyle E[X]=\lambda</math><br /> | |
− | + | <math>\displaystyle Var[X]=\lambda</math><br /> | |
+ | An useful property: If <math>X_i \sim \mathrm{Pois}(\lambda_i)\, i=1,\dots,n</math> are independent and <math>\lambda=\sum_{i=1}^n \lambda_i</math>, then <math>Y = \left( \sum_{i=1}^n X_i \right) \sim \mathrm{Pois}(\lambda)</math> | ||
− | + | A Poisson random variable X can be interpreted as the maximal number of i.i.d. (Independent and Identically Distributed) exponential variables(with parameter <math>\lambda</math>) whose sum does not exceed 1.<br /> | |
+ | The traditional understanding of the Poisson distribution as the total number of events in a specific interval can be understood here since the above definition simply describes the Poisson as the sum of waiting times for n events in an interval of length 1. | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <math>\displaystyle\text{Let } Y_j \sim \text{Exp}(\lambda), U_j \sim \text{Unif}(0,1)</math><br> | ||
+ | <math>Y_j = -\frac{1}{\lambda}\log(U_j) \text{ from Inverse Transform Method}</math><br><br> | ||
− | == | + | <math>\begin{align} |
+ | X &= \max \{ n: \sum_{j=1}^{n} Y_j \leq 1 \} \\ | ||
+ | &= \max \{ n: \sum_{j=1}^{n} - \frac{1}{\lambda}\log(U_j) \leq 1 \} \\ | ||
+ | &= \max \{ n: \sum_{j=1}^{n} \log(U_j) >= -\lambda \} \\ | ||
+ | &= \max \{ n: \log(\prod_{j=1}^{n} U_j) >= -\lambda \} \\ | ||
+ | &= \max \{ n: \prod_{j=1}^{n} U_j >= e^{-\lambda} \} \\ | ||
+ | &= \min \{ n: \prod_{j=1}^{n} U_j >= e^{-\lambda} \} - 1 \\ | ||
+ | \end{align}</math><br><br /> | ||
− | + | Note: From above, we can use Logarithm Rules <math>\log(a)+\log(b)=\log(ab)</math> to generate the result.<br><br /> | |
− | + | '''Algorithm:''' <br /> | |
+ | 1) Set n=1, a=1 <br /> | ||
+ | 2) Generate <math>U_n \sim U(0,1), a=aU_n </math> <br /> | ||
+ | 3) If <math>a >= e^{-\lambda}</math> , then n=n+1, and go to Step 2. Else, x=n-1 <br /> | ||
− | + | using inverse-method to proof mean and variance of poisson distribution. | |
− | : | + | ===MATLAB Code for generating Poisson Distribution=== |
− | + | <pre style='font-size:16px'> | |
− | + | >>l=2; N=1000 | |
+ | >>for ii=1:N | ||
+ | n=1; | ||
+ | a=1; | ||
+ | u=rand; | ||
+ | a=a*u; | ||
+ | while a>exp(-l) | ||
+ | n=n+1; | ||
+ | u=rand; | ||
+ | a=a*u; | ||
+ | end | ||
+ | x(ii)=n-1; | ||
+ | end | ||
+ | >>hist(x) | ||
+ | >>Sum(x==1)/N # Probability of x=1 | ||
+ | >>Sum(x>3)/N # Probability of x > 3 | ||
+ | </pre> | ||
− | + | [[File:Poisson_example.jpg|300px]] | |
− | |||
− | Note: <math>\Gamma(\alpha)=(\alpha-1)! </math> if <math>\alpha</math> is a positive integer. | + | === Another way to generate random variable from poisson distribution === |
+ | <br/> | ||
+ | Note: <math>P(X=x)=\frac {e^{-\lambda}\lambda^x}{x!}, \forall x \in \N</math><br/> | ||
+ | Let <math>\displaystyle p(x) = P(X=x)</math> denote the pmf of <math>\displaystyle X</math>.<br/> | ||
+ | Then ratio is <math>\frac{p(x+1)}{p(x)}=\frac{\lambda}{x+1}, \forall x \in \N</math> <br/> | ||
+ | Therefore, <math>p(x+1)=\frac{\lambda}{x+1}p(x)</math> <br/> | ||
+ | Algorithm: <br/> | ||
+ | 1. Set <math>\displaystyle x=0</math><br/> | ||
+ | 2. Set <math>\displaystyle F=p=e^{-\lambda}</math> <br/> | ||
+ | 3. Generate <math>\displaystyle U \sim~ \text{Unif}(0,1)</math> <br/> | ||
+ | 4. If <math>\displaystyle U<F</math>, output <math>\displaystyle x</math><br/> | ||
+ | Else<br/> | ||
+ | <math>\displaystyle p=\frac{\lambda}{x+1} p</math><br/> | ||
+ | <math>\displaystyle F=F+p</math><br/> | ||
+ | <math>\displaystyle x = x+1</math><br/> | ||
+ | Go to 4.<br/> | ||
+ | |||
+ | This is indeed the inverse-transform method, with a clever way to calculate the CDF on the fly. | ||
+ | |||
+ | u=rand(0.1000) | ||
+ | hist(x) | ||
+ | |||
+ | == Class 9 - Tuesday, June 4, 2013 == | ||
+ | |||
+ | === Beta Distribution === | ||
+ | The beta distribution is a continuous probability distribution. <br> | ||
+ | PDF:<math>\displaystyle \text{ } f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1} </math><br> where <math>0 \leq x \leq 1</math> and <math>\alpha</math>>0, <math>\beta</math>>0<br/> | ||
+ | <div style = "align:left; background:#F5F5DC; font-size: 120%"> | ||
+ | Definition: | ||
+ | In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.<br/.> | ||
+ | More can be find in the link: <ref>http://en.wikipedia.org/wiki/Beta_distribution</ref> | ||
+ | </div> | ||
+ | |||
+ | There are two positive shape parameters in this distribution defined as alpha and beta: <br> | ||
+ | -Both parameters are greater than 0, and X is within the interval [0,1]. <br> | ||
+ | -Alpha is used as exponents of the random variable. <br> | ||
+ | -Beta is used to control the shape of the this distribution. We use the beta distribution to build the model of the behavior of random variables, which are limited to intervals of finite length. <br> | ||
+ | -For example, we can use the beta distribution to analyze the time allocation of sunshine data and variability of soil properties. <br> | ||
+ | |||
+ | If X~Beta(<math>\alpha, \beta</math>) then its p.d.f. is of the form | ||
+ | |||
+ | :<math>\displaystyle \text{ } f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1} </math> where <math>0 \leq x \leq 1</math> and <math>\alpha</math>>0, <math>\beta</math>>0<br> | ||
+ | and | ||
+ | <math>f(x;\alpha,\beta)= 0 </math> otherwise | ||
+ | Note: <math>\Gamma(\alpha)=(\alpha-1)! </math> if <math>\alpha</math> is a positive integer. | ||
+ | |||
+ | Note: Gamma Function Properties | ||
+ | |||
+ | If <math>\alpha=\frac{1}{2} , | ||
+ | |||
+ | \Gamma(\frac {1}{2})=\sqrt\pi </math> | ||
+ | |||
+ | The mean of the beta distribution is <math>\frac{\alpha}{\alpha + \beta}</math>. The variance is <math>\frac{\alpha\beta}{(\alpha+\beta)^2 (\alpha + \beta + 1)}</math> | ||
+ | The variance of the beta distribution decreases monotonically if <math> \alpha = \beta </math> and as <math> \alpha = \beta </math> increases, the variance decreases. | ||
+ | |||
+ | The formula for the cumulative distribution function of the beta distribution is also called the incomplete beta function ratio (commonly denoted by Ix) and is defined as F(x) = I(x)(p,q) | ||
− | To generate random variables of a Beta distribution, there are multiple | + | To generate random variables of a Beta distribution, there are multiple cases depending on the value of <math>\alpha </math> and <math> \beta </math>: |
− | Case 1 | + | '''Case 1:''' If <math>\alpha=1</math> and <math>\beta=1</math> |
:<math>\displaystyle \text{Beta}(1,1) = \frac{\Gamma(1+1)}{\Gamma(1)\Gamma(1)}x^{1-1}(1-x)^{1-1}</math><br> | :<math>\displaystyle \text{Beta}(1,1) = \frac{\Gamma(1+1)}{\Gamma(1)\Gamma(1)}x^{1-1}(1-x)^{1-1}</math><br> | ||
Line 2,800: | Line 3,179: | ||
:<math> = 1 </math><br> | :<math> = 1 </math><br> | ||
+ | Note: 0! = 1. <br> | ||
Hence, the distribution is:<br> | Hence, the distribution is:<br> | ||
:<math>\displaystyle \text{Beta}(1,1) = U (0, 1) </math><br> | :<math>\displaystyle \text{Beta}(1,1) = U (0, 1) </math><br> | ||
+ | If the Question asks for sampling Beta Distribution, we can sample from Uniform Distribution which we already know how to sample from<br> | ||
+ | Algorithm:<br> | ||
+ | Generate U~Unif(0,1)<br> | ||
+ | '''Case 2:''' Either <math>\alpha=1</math> or <math>\beta=1</math> | ||
− | |||
− | + | e.g. <math>\alpha=1</math> | |
− | e.g. <math>\alpha</math> | ||
We don't make any assumption about <math>\beta</math> except that it is a positive integer. <br\> | We don't make any assumption about <math>\beta</math> except that it is a positive integer. <br\> | ||
:<math>\displaystyle \text{f}(x) = \frac{\Gamma(1+\beta)}{\Gamma(1)\Gamma(\beta)}x^{1-1}(1-x)^{\beta-1}=\beta(1-x)^{\beta-1}</math><br> | :<math>\displaystyle \text{f}(x) = \frac{\Gamma(1+\beta)}{\Gamma(1)\Gamma(\beta)}x^{1-1}(1-x)^{\beta-1}=\beta(1-x)^{\beta-1}</math><br> | ||
− | :<math>\beta</math> | + | :<math>\beta=1 </math> |
− | :<math>\displaystyle \text{f}(x) = \frac{\Gamma(\alpha+1)}{\Gamma(\alpha)\Gamma(1)}x^{\alpha-1}(1-x)^{1-1}=\alpha | + | :<math>\displaystyle \text{f}(x) = \frac{\Gamma(\alpha+1)}{\Gamma(\alpha)\Gamma(1)}x^{\alpha-1}(1-x)^{1-1}=\alpha x^{\alpha-1}</math><br> |
− | + | By integrating <math>f(x)</math>, we find the CDF of X is <math>F(x) = x^{\alpha}</math>. | |
− | + | As <math>F(x)^{-1} = x^\frac {1}{\alpha}</math>, using the inverse transform method, <math> X = U^\frac {1}{\alpha} </math> with U ~ U[0,1]. | |
− | |||
− | |||
− | |||
'''Algorithm''' | '''Algorithm''' | ||
Line 2,827: | Line 3,206: | ||
:2. Assign <math>x = u^\frac {1}{\alpha}</math> | :2. Assign <math>x = u^\frac {1}{\alpha}</math> | ||
− | + | After we have simplified this example, we can use other distribution methods to solve the problem. | |
'''MATLAB Code to generate random n variables using the above algorithm''' | '''MATLAB Code to generate random n variables using the above algorithm''' | ||
− | <pre> | + | <pre style='font-size:16px'> |
− | |||
− | |||
+ | x = rand(1,n).^(1/alpha) | ||
</pre> | </pre> | ||
− | Case 3 | + | '''Case 3:'''<br\> To sample from beta in general, we use the property that <br\> |
− | :if | + | :if <math>Y_1</math> follows gamma <math>(\alpha,1)</math><br\> |
− | : | + | : <math>Y_2</math> follows gamma <math>(\beta,1)</math><br\> |
'''Note: | '''Note: | ||
1. <math>\alpha</math> and <math>\beta</math> are shape parameters here and 1 is the scale parameter.<br\>''' | 1. <math>\alpha</math> and <math>\beta</math> are shape parameters here and 1 is the scale parameter.<br\>''' | ||
− | :then <math>Y=\frac { | + | :then <math>Y=\frac {Y_1}{Y_1+Y_2}</math> follows Beta <math>(\alpha,\beta)</math><br\> |
− | 2.Exponential: | + | 2.Exponential: <math>-\frac{1}{\lambda} \log(u)</math> <br\> |
− | 3.Gamma: | + | 3.Gamma: <math>-\frac{1}{\lambda} \log(u_1 * \cdots * u_t)</math><br\> |
'''Algorithm'''<br\> | '''Algorithm'''<br\> | ||
− | *1. Sample from Y1 ~ Gamma (<math>\alpha</math>,1)<br\> | + | *1. Sample from Y1 ~ Gamma (<math>\alpha</math>,1) <math>\alpha</math> is the shape, and 1 is the scale. <br\> |
− | *2. Sample from Y2 ~ Gamma (<math>\beta</math>,1)<br\> | + | *2. Sample from Y2 ~ Gamma (<math>\beta</math>,1) <br\> |
*3. Set | *3. Set | ||
:<math> Y = \frac{Y_1}{Y_1+Y_2}</math><br> | :<math> Y = \frac{Y_1}{Y_1+Y_2}</math><br> | ||
+ | Please see the following example for Matlab code. <br> | ||
+ | |||
+ | |||
+ | '''Case 4:'''<br\> Use The Acceptance-Rejection Method <br\> | ||
+ | The beta density is<br /> | ||
+ | <math>\displaystyle \text{Beta}(\alpha,\beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1} </math> where <math>0 \leq x \leq 1</math><br> | ||
+ | Assume <math>\alpha,\beta \geq 1</math>. Then <math>\displaystyle f(x)</math> has the maximum at <math>\frac{\alpha-1}{\alpha+\beta-2}</math>.<br /> | ||
+ | (Please note that we could find the maximum by taking the derivative of f(x), let f'(x)=0 and then use maximum likelihood estimate to find what the maximum value is)<br> | ||
+ | Define<br /> | ||
+ | <math> c=f(\frac{\alpha-1}{\alpha+\beta-2}</math>) and choose <math>\displaystyle g(x)=1</math>.<br /> | ||
+ | The A-R method becomes<br /> | ||
+ | 1.Generate independent <math>\displaystyle U_1</math> and <math>\displaystyle U_2</math> from <math>\displaystyle UNIF[0,1]</math> until <math>\displaystyle cU_2 \leq f(U_1)</math>;<br /> | ||
+ | 2.Return <math>\displaystyle U_1</math>.<br /> | ||
'''MATLAB Code for generating Beta Distribution''' | '''MATLAB Code for generating Beta Distribution''' | ||
− | <pre> | + | <pre style='font-size:16px'> |
− | Y1 = sum(-log(rand( | + | >>Y1 = sum(-log(rand(10,1000))) #Gamma(10,1), sum 10 exponentials for each of the 1000 samples |
− | Y2 = sum(-log(rand( | + | >>Y2 = sum(-log(rand(5,1000))) #Gamma(5,1), sum 5 exponentials for each of the 1000 samples |
%NOTE: here, lamda is 1, since the scale parameter for Y1 & Y2 are both 1 | %NOTE: here, lamda is 1, since the scale parameter for Y1 & Y2 are both 1 | ||
− | Y=Y1./(Y1+Y2) #Don't forget to divide elements using "." | + | >>Y=Y1./(Y1+Y2) #Don't forget to divide elements using "." Where Y follows Beta(10,5) |
− | figure | + | >>figure |
− | hist(Y1) #Gamma curve | + | >>hist(Y1) #Gamma curve |
− | figure | + | >>figure |
− | hist(Y2) #Gamma curve | + | >>hist(Y2) #Gamma curve |
− | figure | + | >>figure |
+ | |||
+ | >>hist(Y) #Do this to check that the shape fits beta. ~Beta(10,5). | ||
+ | |||
+ | >>disttool #Check the beta plot. | ||
− | |||
</pre> | </pre> | ||
+ | This is the histogram of Y, precisely simulated version of Beta (10,5) | ||
− | [[File: | + | [[File:Beta(10,5)_Simulated.jpg|300px]] |
− | |||
− | + | This is the pdf of various beta distributions | |
+ | |||
+ | [[File:325px-Beta_distribution_pdf.png|300px]] | ||
+ | |||
+ | [[File:untitled.jpg|300px]]<br /> | ||
+ | MATLAB tips: rand(10,1000) produces one 10*1000 matrix and sum(rand(10,1000)) produces a 10*1000 matrix | ||
+ | and each element in the matrix follows CDF of uniform distribution. | ||
+ | |||
+ | Example for the code to explain the beta distribution. | ||
+ | |||
+ | |||
+ | '''Another MATLAB Code for generating Beta Distribution using AR method''' | ||
+ | <pre style='font-size:16px'> | ||
+ | >>alpha = 3 | ||
+ | >>beta = 2 | ||
+ | >> a = sum (-log(rand(alpha,1000))) | ||
+ | >> b = sum (-log(rand(beta,1000))) | ||
+ | >> aandb=sum(-log(rand(alpha+beta,1000))) | ||
+ | >> t = (alpha - 1)/(alpha + beta -2) | ||
+ | >> c = (andb/(a*b))*t^(alpha-1)*(1-t)^(beta-1) | ||
+ | >> u1 = rand | ||
+ | >> u2 = rand | ||
+ | >> x = (andb/(a*b))*u1^(alpha-1)*(1-u1)^(beta-1) | ||
+ | >> while c*u2>x | ||
+ | >> u1 = rand | ||
+ | >> u2 = rand | ||
+ | >> x = (andb/(a*b))*u1^(alpha-1)*(1-u1)^(beta-1) | ||
+ | >> end | ||
+ | >> u1 | ||
+ | </pre> | ||
=== Random Vector Generation === | === Random Vector Generation === | ||
− | We want to sample from <math>X = (X_1, X_2, </math>…,<math> X_d)</math>, a d-dimensional vector. | + | We want to sample from <math>X = (X_1, X_2, </math>…,<math> X_d)</math>, a d-dimensional vector from a known pdf <math>f(x)</math> and cdf <math>F(x)</math>. |
− | We need to take into account the following two cases: | + | We need to take into account the following two cases: |
====Case 1==== | ====Case 1==== | ||
− | + | if the <math>x_1, x_2 \cdots, x_d</math>'s are independent, then<br/> | |
− | + | <math>f(x) = f(x_1,\cdots, x_d) = f(x_1)\cdots f(x_d)</math><br/> | |
+ | we can sample from each component <math>x_1, x_2,\cdots, x_d</math> individually, and then form a vector.<br/> | ||
+ | |||
+ | based on the property of independence, we can derive the pdf or pmf of <math>x=x_1,x_2,x_3,x_4,x_5,\cdots</math> | ||
+ | |||
+ | ====Case 2==== | ||
+ | If <math>X_1, X_2, \cdots , X_d</math> are not independent<br/> | ||
+ | <math>f(x) = f(x_1, \cdots , x_d) = f(x_1) f(x_2|x_1) \cdots f(x_d|x_{d-1},\cdots ,x_1)</math><br/> | ||
+ | we need to know the conditional distributions of <math>f(x_2|x_1), f(x_3|x_2, x_1),\cdots, f(x_d|x_{d-1}, \cdots, x_1)</math><br/> | ||
+ | This is generally a hard problem. Conditional probabilities are not easy to compute, then sampling from these would be based on your statistics knowledge. | ||
+ | In each case, we have to consider the previous cases. | ||
+ | <math>f(x_1)</math> is one-dimensional, some as <math>f(x_2|x_1)</math> and all others. | ||
+ | In general, one could consider the covariance matrix <math> C </math> of random variables <math> X_1</math>,…,<math>X_d </math>. <br> | ||
+ | Suppose we now have the Cholesky factor <math> G</math> of <math> C </math> (i.e. <math> C = GG^T </math>). In matlab, we use Chol(C) <br> | ||
+ | For any d-tuple <math> X := (X_1 ,\ldots , X_d) </math> (i.e random variable generated by <math> X_1,\ldots , X_d </math> respectively) | ||
+ | <math> GX </math> would yield the desired distribution. <br/> | ||
+ | |||
+ | '''Note''' (Product Rule)<br/> | ||
+ | 1.) All cases can use this (independent or dependent): <math>f(x) = f(x_1, x_2)= f(x_1) f(x_2|x_1)</math> <br/> | ||
+ | 2.) If we determine that <math>x_1</math> and <math> x_2</math> are ''independent'', then we can use <math>f(x) = f(x_1, x_2)= f(x_1)f(x_2)</math> <br/> | ||
+ | *ie. If late for class=<math>x_1</math> and sick=<math>x_2</math>, then these are dependent variables so can only use equation 1 (<math>f(x) = f(x_1, x_2)= f(x_1) f(x_2|x_1)</math>)<br/> | ||
+ | *ie. If late for class=<math>x_1</math> and milk is white=<math>x_2</math>, then these are independent variables so can use both equations 1 and 2. <br/> | ||
+ | |||
+ | the case show the formula of the X = (X1,X2,…,Xd), a d-dimensional vector, when they are not independent of each x. we use conditional function to define the probability function of x with d-dimensional. | ||
+ | |||
+ | ====Example==== | ||
+ | Generate uniform random vectors | ||
+ | |||
+ | 1) x = (x<sub>1</sub>, …, x<sub>d</sub>) from the d-dimensional rectangle <br/> | ||
+ | 2) D = { (x<sub>1</sub>, …, x<sub>d</sub>) : a<sub>i</sub> <= x<sub>i</sub> <= b<sub>i</sub> , i = 1, …, d} <br/> | ||
+ | |||
+ | Algorithm: <br/> | ||
+ | 1) For i = 1 to d <br/> | ||
+ | 2) U<sub>i</sub> ~ U(0,1) <br/> | ||
+ | 3) x<sub>i</sub> = a<sub>i</sub> + U(b<sub>i</sub>-a<sub>i</sub>) <br/> | ||
+ | 4) End <br/> | ||
+ | |||
+ | *Note: x<sub>i</sub> = a<sub>i</sub> + U(b<sub>i</sub>-a<sub>i</sub>) denotes X<sub>i</sub> ~U(a<sub>i</sub>,b<sub>i</sub>) <br/> | ||
+ | |||
+ | An example of the 2-D case is given below: | ||
+ | |||
+ | <pre style='font-size:14px'> | ||
+ | >>a=[1 2]; | ||
+ | >>b=[4 6]; | ||
+ | >>for i=1:2 | ||
+ | u(i) = rand(); | ||
+ | x(i) = a(i) + (b(i) - a(i))*u(i); | ||
+ | end | ||
+ | |||
+ | >>hold on => this is to retain current graph when adding new graphs | ||
+ | >>rectangle('Position',[1 2 3 4]) => draw the boundary of the rectangle | ||
+ | >>axis([0 10 0 10]) => change the size of axes | ||
+ | >>plot(x(1),x(2),'.') | ||
+ | |||
+ | </pre> | ||
+ | [[File:2d_ex.jpg|300px]] | ||
+ | |||
+ | ==== Matlab Code: ==== | ||
+ | |||
+ | <pre style='font-size:14px'> | ||
+ | function x = urectangle (d,n,a,b) | ||
+ | for ii = 1:d; | ||
+ | u(ii,:) = rand(1,n); | ||
+ | x(ii,:) = a+ u(ii,:)*(b-a); | ||
+ | %keyboard #makes the function stop at this step so you can evaluate the variables | ||
+ | end | ||
+ | |||
+ | >>x=urectangle(2, 100, 2, 5); | ||
+ | >>scatter(x(1,:),x(2,:)) | ||
+ | |||
+ | >>x=urectangle(2, 10000, 2, 5); #generate 10000 numbers (instead of 100) | ||
+ | >>x=urectangle(3, 10000, 2, 5); #changed to 3-dimensional | ||
+ | >>scatter3(x(1,:), x(2,:), x(3,:)) | ||
+ | >>axis square | ||
+ | </pre> | ||
+ | [[File:Urectangle_2d.jpg|300px]][[File:Urectangle_3d.jpg|300px]] | ||
+ | |||
+ | === Vector Acceptance-Rejection Method === | ||
+ | |||
+ | The acceptance-rejection method can be extended to n-dimensional cases, with the same concept: | ||
+ | |||
+ | If a random vector is to be generated uniformly from G, an irregular shape in the nth dimension, and W is a regular shape arbitrarily close to G in the nth dimension, then acceptance-rejection method can be applied as follows: | ||
+ | |||
+ | 1. Sample from the regular shape W | ||
+ | |||
+ | 2. Accept sample points if they are inside G | ||
+ | |||
+ | <br>'''Example:''' <br> | ||
+ | Generate a random vector Z that is uniformly distributed over region G | ||
+ | |||
+ | G: d-dimensional unit ball, <math>G = \big\{{x: \sum_{i}{x_i}^2 \leq 1}\big\}</math> | ||
+ | |||
+ | w: d-dimensional hypercube, <math>W = \big\{{-1 \leq x_i \leq 1}\big\}_{i=1}^d</math> | ||
+ | |||
+ | '''Procedure:'''<br /> | ||
+ | Step 1: <math>U_1 \sim~ U(0,1),\cdots, U_d \sim~ U(0,1)</math><br /> | ||
+ | Step 2: <math>X_1 = 1 - 2U_1, \cdots, X_d = 1 - 2U_d, R = \sum_i X_i^2</math><br /> | ||
+ | Step 3: If <math>R \leq 1, Z=(X_1, ..... , X_d)</math><br /> | ||
+ | Else go to step 1 | ||
+ | |||
+ | it is an example of the vector A/R, regular shape is W likes the proposal distribution g(x), G is the target distribution g(x) <br\> | ||
+ | |||
+ | Suppose we sampled from the target area W uniformly, let Aw, Ag indicate the area of W and G, g(x)=1/Aw and f(x)=1/Ag | ||
+ | |||
+ | |||
+ | The following is a picture relating to the example | ||
+ | |||
+ | [[File:Untitled.jpg]] | ||
+ | |||
+ | Matlab code: | ||
+ | <pre style='font-size:16px'> | ||
+ | u = rand(d,n); | ||
+ | z = 1- 2 *u; | ||
+ | R = sum(z.^2); | ||
+ | jj=1; | ||
+ | |||
+ | for ii=1:n | ||
+ | |||
+ | if R(ii)<=1 | ||
+ | |||
+ | x(:,jj)=z(:,ii); | ||
+ | jj=jj+1; | ||
+ | |||
+ | end | ||
+ | |||
+ | end | ||
+ | |||
+ | output = x; | ||
+ | |||
+ | end | ||
+ | </pre> | ||
+ | |||
+ | ==Class 10 - Thursday June 6th 2013 == | ||
+ | MATLAB code for using Acceptance/Rejection Method to sample from a d-dimensional unit ball. | ||
+ | G: d-dimensional unit ball G | ||
+ | W: d-dimensional Hypercube | ||
+ | |||
+ | <pre style='font-size:16px'> | ||
+ | 1) U1~UNIF(0,1) | ||
+ | U2~UNIF(0,1) | ||
+ | ... | ||
+ | Ud~UNIF(0,1) | ||
+ | 2) X1 = 1-2U1 | ||
+ | X2 = 1-2U2 | ||
+ | ... | ||
+ | Xd = 1-2Ud | ||
+ | R = sum(Xi^2) | ||
+ | 3) If R<=1 | ||
+ | X = (X1,X2,...,Xd), | ||
+ | else go to step 1 | ||
+ | </pre> | ||
+ | |||
+ | ==== Code: ==== | ||
+ | |||
+ | <pre style='font-size:16px'> | ||
+ | function output = Unitball(d,n) | ||
+ | |||
+ | u = rand(d,n); | ||
+ | z = 1- 2 *u; | ||
+ | R = sum(z.^2); | ||
+ | jj=1; | ||
+ | |||
+ | for ii=1:n | ||
+ | |||
+ | if R(ii)<=1 | ||
+ | |||
+ | x(:,jj)=z(:,ii); | ||
+ | jj=jj+1; | ||
+ | |||
+ | end | ||
+ | |||
+ | end | ||
+ | |||
+ | output = x; | ||
+ | |||
+ | end | ||
+ | |||
+ | >> data = Unitball(d, n) | ||
+ | >> scatter(data(1,:), data(2,:)) %plot 2d graph | ||
+ | |||
+ | R(ii) computes the sum of the square of each element of a vector, so if it is less than 1, | ||
+ | then the vector is in the unit ball. | ||
+ | |||
+ | x(:,jj) means all the numbers in the jj column. | ||
+ | |||
+ | z(:,ii) means all the numbers in the ii column starting from 1st column until the nth | ||
+ | column, which is the last one. | ||
+ | |||
+ | higher dimension, less efficient and we need more data points | ||
+ | |||
+ | Save it with the name of the pattern. | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | |||
+ | [[File:Unitball.png|400px]] | ||
+ | |||
+ | <pre style='font-size:16px'> | ||
+ | Execution: | ||
+ | |||
+ | >>[x]=Unitball(2,10000); | ||
+ | >>scatter(x(1,:),x(2,:)); %plot 2D circle | ||
+ | >>axis square; %make the x-y axis has same size | ||
+ | >>size(x) | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 2 7839 | ||
+ | |||
+ | >>scatter(x(1,:),x(2,:)) | ||
+ | </pre> | ||
+ | |||
+ | scatter(x(1,:),x(2,:)) the (x(1,:) means all the numbers in the first row are parameter. | ||
+ | |||
+ | '''Calculate the efficiency:''' | ||
+ | <pre style='font-size:16px'> | ||
+ | |||
+ | >>c=7839/10000 %Efficiency = points accepted / total points | ||
+ | |||
+ | c = | ||
+ | |||
+ | 0.7839 | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | We can use the above program to calculate how many points in the circle condition are in the square. | ||
+ | |||
+ | '''Estimate <math>\displaystyle \pi</math>''' | ||
+ | :*We know the radius is 1 | ||
+ | :*Then the area of the square is <math>(1-(-1))^2=4</math><br\> | ||
+ | :*Then the area of the circle is <math>\pi</math><br\> | ||
+ | :*<math>\pi</math> is approximated to be <math>4\times c=4 \times 0.7839=3.1356</math> in the above example <br\> | ||
+ | |||
+ | <pre style='font-size:16px'> | ||
+ | >> 4*size(x,2)/10000 | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 3.1356 | ||
+ | |||
+ | >> [x]=Unitball(3,10000); | ||
+ | >> scatter3(x(1,:),x(2,:),x(3,:)) %plot 3d ball | ||
+ | >> axis square | ||
+ | >> size(x,2)/10000 %returns the size of the dimension of X specified by scalar 2 | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 0.5231 | ||
+ | |||
+ | >> [x]=Unitball(5,10000); | ||
+ | >> size(x,2)/10000 | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 0.1648 | ||
+ | |||
+ | </pre> | ||
+ | 3d unit ball | ||
+ | |||
+ | [[File:3-dimensional unitball.jpg|400px]] | ||
+ | |||
+ | Note that c increases exponentially as d increases, which will result in a lower acceptance rate and more points being rejected. So this method is not efficient for large values of d. | ||
+ | |||
+ | In practice, when we need to vectorlize a high quality image or genes then d would have to be very large. So AR method is not an efficient way to solve the problem. | ||
+ | |||
+ | === Efficiency === | ||
+ | |||
+ | In the above example, the efficiency of the vector A/R is equal to the ratio | ||
+ | |||
+ | <math>\frac{1}{C}=\frac{\text{volume of hyperball}}{\text{volume of hybercube}}= \max \frac{g(x)}{f(x)} </math> | ||
+ | |||
+ | In general, the efficiency can be thought of as the total number of points accepted divided by the total number of points generated. | ||
+ | |||
+ | As the dimension '''increase''', the efficiency of the algorithm will '''decrease''' exponentially. | ||
+ | |||
+ | For example, for approximating value of <math>\pi</math>, when <math>d \text{(dimension)} =2</math>, the efficiency is around 0.7869; when <math>d=3</math>, the efficiency is around 0.5244; when <math>d=10</math>, the efficiency is around 0.0026: it is getting close to 0. | ||
+ | |||
+ | A 'C' value of 1 implies an acceptance rate of 100% (most efficient scenario) but as we sample from higher dimensions, 'C' usually gets larger. Thus, when we want to generate high dimension vectors, Acceptance-Rejection Method is not efficient to be used. | ||
+ | |||
+ | <span style="color:red;padding:0 auto;"><br>The end of midterm coverage</span> | ||
+ | <div style="border:1px solid #cccccc;border-radius:10px;box-shadow: 0 5px 15px 1px rgba(0, 0, 0, 0.6), 0 0 200px 1px rgba(255, 255, 255, 0.5);padding:20px;margin:20px;background:#FFFFAD;"> | ||
+ | <h2 style="text-align:center;">Summary of vector acceptance-rejection sampling</h2> | ||
+ | <p><b>Problem:</b> <math> f(x_1, x_2, ...x_n)</math> is difficult to sample from</p> | ||
+ | <p><b>Plan:</b></p> | ||
+ | Let W represent the sample space covered by <math> f(x_1, x_2, ...x_n)</math> | ||
+ | <ol> | ||
+ | <li>1.Draw <math>\vec{y}=y_1,y_2...y_n\sim~g()</math> where g has sample space G which is greater than W. g is a distribution that is easy to sample from (i.e. uniform)</li> | ||
+ | <li>2.if <math>\vec{y} \subseteq W </math> then <math>\vec{x}=\vec{y} </math><br /> else go 1) </li> | ||
+ | </ol> | ||
+ | <p>x will have the desired distribution.</p> | ||
+ | |||
+ | </div> | ||
+ | |||
+ | ==== Stochastic Process ==== | ||
+ | The basic idea of Stochastic Process (also called random process) is a collection of some random variables, | ||
+ | <math>\big\{X_t:t\in T\big\}</math>, where the set X is called the state space that each variable is in it and T is called the index set. | ||
+ | |||
+ | '''Definition:''' In probability theory, a stochastic process /stoʊˈkæstɪk/, or sometimes random process (widely used) is a collection of random variables; this is often used to represent the evolution of some random value, or system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. (from Wikipedia) | ||
+ | |||
+ | A stochastic process is non-deterministic. This means that even if we know the initial condition(state), and we know some possibilities of the states to follow, the exact value of the final state remains to be uncertain. | ||
+ | |||
+ | We can illustrate this with an example of speech: if "I" is the first word in a sentence, the set of words that could follow would be limited (eg. like, want, am), and the same happens for the third word and so on. The words then have some probabilities among them such that each of them is a random variable, and the sentence would be a collection of random variables. <br> | ||
+ | Also, Different Stochastic Process has different properties. | ||
+ | |||
+ | In the course, we study two Stochastic Process models. | ||
+ | |||
+ | The two stochastic Process models we will study are: | ||
+ | |||
+ | 1. Poisson Process-This is continuous time counting process that satisfies a couple of properties that are listed in the next section. The Poisson process is understood to be a good model for events such as incoming phone calls, number of traffic accidents, and goals during a game of hockey or soccer. It is also an example of a birth-death process.<br> | ||
+ | 2. Markov Process- This is a stochastic process that satisfies the Markov property which can be understood as the memory-less property. The property states that the jump to a future state only depends on the current state of the process, and not of the process's history. This model is used to model random walks exhibited by particles, the health state of a life insurance policyholder, decision making by a memory-less mouse in a maze, etc. <br> | ||
+ | |||
+ | |||
+ | =====Example===== | ||
+ | The state space is the set of English words, and <math>x_t</math> are words that are said. Another example involves the stock market: the set of all non | ||
− | + | Start by sampling from <math>x_1</math>: | |
+ | <math>\displaystyle Y_1 \sim f(x_1 | x_{t_2}, ..., x_{t_d})</math><br /> | ||
− | + | <math>\displaystyle Y_i \sim f(x_i | Y_1, ..., Y_{i-1}, x_{t_{i+1}}, ..., x_{t_d})</math>, where <math>i=2, ..., d</math><br /> | |
− | |||
− | + | <math>\displaystyle Y_d \sim f(x_d | Y_1, ..., Y_{d-1})</math><br /> | |
− | + | ===Example:=== | |
− | |||
− | + | Consider a biased die | |
+ | <math>\pi</math>= [0.1, 0.1, 0.3, 0.3, 0.1, 0.1] | ||
− | + | We use <math>6 x 6 </math> matrix <math> \mathbf{Q} </math> as the proposal distribution <br> | |
+ | And we use U(0,1) distribution. | ||
− | + | <math> \mathbf{Q} = | |
+ | \begin{bmatrix} | ||
+ | 1/6 & 1/6 & \cdots & 1/6 \\ | ||
+ | 1/6 & 1/6 & \cdots & 1/6 \\ | ||
+ | \vdots & \vdots & \ddots & \vdots \\ | ||
+ | 1/6 & 1/6 & \cdots & 1/6 | ||
+ | \end{bmatrix} | ||
+ | </math> <br/> | ||
− | |||
− | ==== | + | '''Algorithm''' <br> |
+ | 1. <math>x_t=5</math> | ||
+ | 2. Y~unif[1,2,...,6]<br /> | ||
+ | 3. <math> r_{ij} = \min \{\frac{\pi_j q_{ji}}{\pi_i q_{ij}}, 1\} = \min \{\frac{\pi_j 1/6}{\pi_i 1/6}, 1\} = \min \{\frac{\pi_j}{\pi_i}, 1\}</math><br> | ||
+ | 4. U~Unif(0,1)<br/> | ||
+ | if <math>u \leq r_{ij}</math>, X<sub>t+1</sub>=Y<br /> | ||
+ | else X<sub>t+1</sub>=X<sub>t</sub><br /> | ||
+ | go back to 2<br> | ||
− | + | ===Monte Carlo Integration=== | |
− | + | *It is a technique used for numerical integration using random numbers.<br/> | |
− | + | *This method is one of the Monte Carlo methods that numerically computes definite integrals. <br/> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | *The above integral can be rewritten as following:<br> | |
+ | <math>I = \int_a^b h(x)dx = \int_a^b h(x) \frac{b-a}{b-a} dx = \int_a^b \frac{h(x)}{b-a} (b-a) dx </math> where <math> U(a,b) = 1/(b-a) </math> <br/> | ||
− | |||
− | x= | + | So we have <math> w(x)= \frac{h(x)}{b-a} </math> and <math>\hat{I} = \frac{1}{n} \sum_{i=1}^n w(x_i),x_i \sim U(a,b)</math><br /> |
− | x | ||
− | |||
− | |||
− | </ | ||
− | |||
− | + | For the case where we do not have finite bounds on the integration, we have | |
+ | <math>I = \int h(x)f(x)dx</math><br /> | ||
− | + | <math>\hat{I} = \frac{1}{n} \sum _{i=1}^n h(x_i) , \text{where} \ x_i \sim f</math> | |
− | + | ===Importance Sampling=== | |
+ | Importance Sampling is a useful technique for variance reduction.<br /> | ||
− | + | Using importance sampling, we have:<br/> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | <math>I=\int_{a}^{b}f(x)dx = \int_{a}^{b}f(x)(b-a) \times \frac{1}{b-a}dx</math> <br /> | ||
− | + | If g(x) is another probability density function, <br /> | |
− | |||
− | + | <math>I = \int h(x)f(x)\,dx =\int\frac{h(x)f(x)}{g(x)}\times g(x)\,dx</math>, where <math>w(x) = \frac{h(x)f(x)}{g(x)}</math><br /> | |
− | + | To approximate I,<br/> | |
− | + | <math>\widehat{I}=\frac{1}{n}\sum_{i=1}^{n}w(x)</math> and <math>g^{*}(x) = \frac{|h(x)|f(x)}{\int |h(x)|f(x)dx}</math>, where <math> h(x)>=0 </math> for all x <br> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | '''Note:''' g(x) should be chosen carefully so that its distribution would minimize the variance. |
Latest revision as of 09:46, 30 August 2017
If you use ideas, plots, text, code and other intellectual property developed by someone else in your `wikicoursenote' contribution , you have to cite the original source. If you copy a sentence or a paragraph from work done by someone else, in addition to citing the original source you have to use quotation marks to identify the scope of the copied material. Evidence of copying or plagiarism will cause a failing mark in the course.
Example of citing the original source
Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref>
Contents
- 1 Important Notes
- 2 Introduction, Class 1 - Tuesday, May 7
- 3 Class 2 - Thursday, May 9
- 4 Class 3 - Tuesday, May 14
- 5 Summary of Inverse Transform Method
- 6 Class 4 - Thursday, May 16
- 7 Class 5 - Tuesday, May 21
- 8 Acceptance-Rejection Method
- 9 Class 6 - Thursday, May 23
- 10 Class 7 - Tuesday, May 28
- 11 Class 8 - Thursday, May 30, 2013
- 12 Class 9 - Tuesday, June 4, 2013
- 13 Class 10 - Thursday June 6th 2013
- 14 Summary of vector acceptance-rejection sampling
- 15 Class 11 - Tuesday，June 11, 2013
- 16 Class 12 - Thursday，June 13, 2013
- 16.1 Midterm Review
- 16.2 Multiplicative Congruential Algorithm
- 16.3 Inverse Transformation Method
- 16.4 Acceptance-Rejection Method
- 16.5 Multivariate
- 16.6 Vector A/R Method
- 16.7 Common distribution
- 16.8 Exponential
- 16.9 Normal
- 16.10 Gamma
- 16.11 Bernoulli
- 16.12 Binomial
- 16.13 Beta Distribution
- 16.14 Geometric
- 16.15 Poisson
- 17 Class 13 - Tuesday June 18th 2013
- 18 Class 14 - Thursday June 20th 2013
- 19 Class 15 - Tuesday June 25th 2013
- 20 Class 16 - Thursday June 27th 2013
- 21 Class 17 - Tuesday July 2nd 2013
- 22 Class 18 - Thursday July 4th 2013
- 23 Class 19 - Tuesday July 9th 2013
- 24 Class 20 - Thursday July 11th 2013
- 25 Class 21 - Tuesday July 16, 2013
- 26 Class 22, Thursday, July 18, 2013
- 27 Class 23, Tuesday July 23
- 28 Class 24, Thursday, July 25, 2013
Important Notes
To make distinction between the material covered in class and additional material that you have add to the course, use the following convention. For anything that is not covered in the lecture write:
In the news recently was a story that captures some of the ideas behind PCA. Over the past two years, Scott Golder and Michael Macy, researchers from Cornell University, collected 509 million Twitter messages from 2.4 million users in 84 different countries. The data they used were words collected at various times of day and they classified the data into two different categories: positive emotion words and negative emotion words. Then, they were able to study this new data to evaluate subjects' moods at different times of day, while the subjects were in different parts of the world. They found that the subjects generally exhibited positive emotions in the mornings and late evenings, and negative emotions mid-day. They were able to "project their data onto a smaller dimensional space" using PCS. Their paper, "Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures," is available in the journal Science.<ref>http://www.pcworld.com/article/240831/twitter_analysis_reveals_global_human_moodiness.html</ref>.
Assumptions Underlying Principal Component Analysis can be found here<ref>http://support.sas.com/publishing/pubcat/chaps/55129.pdf</ref>
Introduction, Class 1 - Tuesday, May 7
Course Instructor: Ali Ghodsi
Lecture:
001: T/Th 8:30-9:50am MC1085
002: T/Th 1:00-2:20pm DC1351
Tutorial:
2:30-3:20pm Mon M3 1006
Office Hours:
Friday at 10am, M3 4208
Midterm
Monday June 17,2013 from 2:30pm-3:20pm
Final
Saturday August 10,2013 from 7:30pm-10:00pm
TA(s):
TA | Day | Time | Location |
---|---|---|---|
Lu Cheng | Monday | 3:30-5:30 pm | M3 3108, space 2 |
Han ShengSun | Tuesday | 4:00-6:00 pm | M3 3108, space 2 |
Yizhou Fang | Wednesday | 1:00-3:00 pm | M3 3108, space 1 |
Huan Cheng | Thursday | 3:00-5:00 pm | M3 3111, space 1 |
Wu Lin | Friday | 11:00-1:00 pm | M3 3108, space 1 |
Four Fundamental Problems
1 Classification: Given input object X, we have a function which will take this input X and identify which 'class (Y)' it belongs to (Discrete Case)
i.e taking value from x, we could predict y.
(For example, if you have 40 images of oranges and 60 images of apples (represented by x), you can estimate a function that takes the images and states what type of fruit it is - note Y is discrete in this case.)
2 Regression: Same as classification but in the continuous case except y is non discrete. Results from regression are often used for prediction,forecasting and etc. (Example of stock prices, height, weight, etc.)
(A simple practice might be investigating the hypothesis that higher levels of education cause higher levels of income.)
3 Clustering: Use common features of objects in same class or group to form clusters.(in this case, x is given, y is unknown; For example, clustering by provinces to measure average height of Canadian men.)
4 Dimensionality Reduction (also known as Feature extraction, Manifold learning): Used when we have a variable in high dimension space and we want to reduce the dimension
Applications
Most useful when structure of the task is not well understood but can be characterized by a dataset with strong statistical regularity
Examples:
- Computer Vision, Computer Graphics, Finance (fraud detection), Machine Learning
- Search and recommendation (eg. Google, Amazon)
- Automatic speech recognition, speaker verification
- Text parsing
- Face identification
- Tracking objects in video
- Financial prediction(e.g. credit cards)
- Fraud detection
- Medical diagnosis
Course Information
Prerequisite: (One of CS 116, 126/124, 134, 136, 138, 145, SYDE 221/322) and (STAT 230 with a grade of at least 60% or STAT 240) and (STAT 231 or 241)
Antirequisite: CM 361/STAT 341, CS 437, 457
General Information
- No required textbook
- Recommended: "Simulation" by Sheldon M. Ross
- Computing parts of the course will be done in Matlab, but prior knowledge of Matlab is not essential (will have a tutorial on it)
- First midterm will be held on Monday, June 17 from 2:30 to 3:30
- Announcements and assignments will be posted on Learn.
- Other course material on: http://wikicoursenote.com/wiki/
- Log on to both Learn and wikicoursenote frequently.
- Email all questions and concerns to UWStat340@gmail.com. Do not use your personal email address! Do not email instructor or TAs about the class directly to their personal accounts!
Wikicourse note (complete at least 12 contributions to get 10% of final mark):
When applying for an account in the wikicourse note, please use the quest account as your login name while the uwaterloo email as the registered email. This is important as the quest id will be used to identify the students who make the contributions.
Example:
User: questid
Email: questid@uwaterloo.ca
After the student has made the account request, do wait for several hours before students can login into the account using the passwords stated in the email. During the first login, students will be ask to create a new password for their account.
As a technical/editorial contributor: Make contributions within 1 week and do not copy the notes on the blackboard.
All contributions are now considered general contributions you must contribute to 50% of lectures for full marks
- A general contribution can be correctional (fixing mistakes) or technical (expanding content, adding examples, etc.) but at least half of your contributions should be technical for full marks.
Do not submit copyrighted work without permission, cite original sources. Each time you make a contribution, check mark the table. Marks are calculated on an honour system, although there will be random verifications. If you are caught claiming to contribute but have not, you will not be credited.
Wikicoursenote contribution form : https://docs.google.com/forms/d/1Sgq0uDztDvtcS5JoBMtWziwH96DrBz2JiURvHPNd-xs/viewform
- you can submit your contributions multiple times.
- you will be able to edit the response right after submitting
- send email to make changes to an old response : uwstat340@gmail.com
Tentative Topics
- Random variable and stochastic process generation
- Discrete-Event Systems
- Variance reduction
- Markov Chain Monte Carlo
Class 2 - Thursday, May 9
Generating Random Numbers
Introduction
Simulation is the imitation of a process or system over time. Computational power has introduced the possibility of using simulation study to analyze models used to describe a situation.
In order to perform a simulation study, we should:
<br\> 1 Use a computer to generate (pseudo*) random numbers (rand in MATLAB).
2 Use these numbers to generate values of random variable from distributions: for example, set a variable in terms of uniform u ~ U(0,1).
3 Using the concept of discrete events, we show how the random variables can be used to generate the behavior of a stochastic model over time. (Note: A stochastic model is the opposite of deterministic model, where there are several directions the process can evolve to)
4 After continually generating the behavior of the system, we can obtain estimators and other quantities of interest.
The building block of a simulation study is the ability to generate a random number. This random number is a value from a random variable distributed uniformly on (0,1). There are many different methods of generating a random number:
Physical Method: Roulette wheel, lottery balls, dice rolling, card shuffling etc.
Numerically/Arithmetically: Use of a computer to successively generate pseudorandom numbers. The
sequence of numbers can appear to be random; however they are deterministically calculated with an
equation which defines pseudorandom.
(Source: Ross, Sheldon M., and Sheldon M. Ross. Simulation. San Diego: Academic, 1997. Print.)
- We use the prefix pseudo because computer generates random numbers based on algorithms, which suggests that generated numbers are not truly random. Therefore pseudo-random numbers is used.
In general, a deterministic model produces specific results given certain inputs by the model user, contrasting with a stochastic model which encapsulates randomness and probabilistic events.
A computer cannot generate truly random numbers because computers can only run algorithms, which are deterministic in nature. They can, however, generate Pseudo Random Numbers
Pseudo Random Numbers are the numbers that seem random but are actually determined by a relative set of original values. It is a chain of numbers pre-set by a formula or an algorithm, and the value jump from one to the next, making it look like a series of independent random events. The flaw of this method is that, eventually the chain returns to its initial position and pattern starts to repeat, but if we make the number set large enough we can prevent the numbers from repeating too early. Although the pseudo random numbers are deterministic, these numbers have a sequence of value and all of them have the appearances of being independent uniform random variables. Being deterministic, pseudo random numbers are valuable and beneficial due to the ease to generate and manipulate.
When people repeat the test many times, the results will be the closed express values, which make the trials look deterministic. However, for each trial, the result is random. So, it looks like pseudo random numbers.
Mod
Let [math]n \in \N[/math] and [math]m \in \N^+[/math], then by Division Algorithm,
[math]\exists q, \, r \in \N \;\text{with}\; 0\leq r \lt m, \; \text{s.t.}\; n = mq+r[/math],
where [math]q[/math] is called the quotient and [math]r[/math] the remainder. Hence we can define a binary function
[math]\mod : \N \times \N^+ \rightarrow \N [/math] given by [math]r:=n \mod m[/math] which returns the remainder after division by m.
Generally, mod means taking the reminder after division by m.
We say that n is congruent to r mod m if n = mq + r, where m is an integer.
Values are between 0 and m-1
if y = ax + b, then [math]b:=y \mod a[/math].
Example 1:
[math]30 = 4 \cdot 7 + 2[/math]
[math]2 := 30\mod 7[/math]
[math]25 = 8 \cdot 3 + 1[/math]
[math]1: = 25\mod 3[/math]
[math]-3=5\cdot (-1)+2[/math]
[math]2:=-3\mod 5[/math]
Example 2:
If [math]23 = 3 \cdot 6 + 5[/math]
Then equivalently, [math]5 := 23\mod 6[/math]
If [math]31 = 31 \cdot 1[/math]
Then equivalently, [math]0 := 31\mod 31[/math]
If [math]-37 = 40\cdot (-1)+ 3[/math]
Then equivalently, [math]3 := -37\mod 40[/math]
Example 3:
[math]77 = 3 \cdot 25 + 2[/math]
[math]2 := 77\mod 3[/math]
[math]25 = 25 \cdot 1 + 0[/math]
[math]0: = 25\mod 25[/math]
Note: [math]\mod[/math] here is different from the modulo congruence relation in [math]\Z_m[/math], which is an equivalence relation instead of a function.
The modulo operation is useful for determining if an integer divided by another integer produces a non-zero remainder. But both integers should satisfy [math]n = mq + r[/math], where [math]m[/math], [math]r[/math], [math]q[/math], and [math]n[/math] are all integers, and [math]r[/math] is smaller than [math]m[/math]. The above rules also satisfy when any of [math]m[/math], [math]r[/math], [math]q[/math], and [math]n[/math] is negative integer, see the third example.
Mixed Congruential Algorithm
We define the Linear Congruential Method to be [math]x_{k+1}=(ax_k + b) \mod m[/math], where [math]x_k, a, b, m \in \N, \;\text{with}\; a, m \neq 0[/math]. Given a seed (i.e. an initial value [math]x_0 \in \N[/math]), we can obtain values for [math]x_1, \, x_2, \, \cdots, x_n[/math] inductively. The Multiplicative Congruential Method, invented by Berkeley professor D. H. Lehmer, may also refer to the special case where [math]b=0[/math] and the Mixed Congruential Method is case where [math]b \neq 0[/math]
. Their title as "mixed" arises from the fact that it has both a multiplicative and additive term.
An interesting fact about Linear Congruential Method is that it is one of the oldest and best-known pseudo random number generator algorithms. It is very fast and requires minimal memory to retain state. However, this method should not be used for applications that require high randomness. They should not be used for Monte Carlo simulation and cryptographic applications. (Monte Carlo simulation will consider possibilities for every choice of consideration, and it shows the extreme possibilities. This method is not precise enough.)
"Source: STAT 340 Spring 2010 Course Notes"
First consider the following algorithm
[math]x_{k+1}=x_{k} \mod m[/math]
such that: if [math]x_{0}=5(mod 150)[/math], [math]x_{n}=3x_{n-1}[/math], find [math]x_{1},x_{8},x_{9}[/math].
[math]x_{n}=(3^n)*5(mod 150)[/math]
[math]x_{1}=45,x_{8}=105,x_{9}=15[/math]
Example
[math]\text{Let }x_{0}=10,\,m=3[/math]
- [math]\begin{align} x_{1} &{}= 10 &{}\mod{3} = 1 \\ x_{2} &{}= 1 &{}\mod{3} = 1 \\ x_{3} &{}= 1 &{}\mod{3} =1 \\ \end{align}[/math]
[math]\ldots[/math]
Excluding [math]x_{0}[/math], this example generates a series of ones. In general, excluding [math]x_{0}[/math], the algorithm above will always generate a series of the same number less than M. Hence, it has a period of 1. The period can be described as the length of a sequence before it repeats. We want a large period with a sequence that is random looking. We can modify this algorithm to form the Multiplicative Congruential Algorithm.
[math]x_{k+1}=(a \cdot x_{k} + b) \mod m [/math](a little tip: [math](a \cdot b)\mod c = (a\mod c)\cdot(b\mod c))[/math]
Example
[math]\text{Let }a=2,\, b=1, \, m=3, \, x_{0} = 10[/math]
[math]\begin{align}
\text{Step 1: } 0&{}=(2\cdot 10 + 1) &{}\mod 3 \\
\text{Step 2: } 1&{}=(2\cdot 0 + 1) &{}\mod 3 \\
\text{Step 3: } 0&{}=(2\cdot 1 + 1) &{}\mod 3 \\
\end{align}[/math]
[math]\ldots[/math]
This example generates a sequence with a repeating cycle of two integers.
(If we choose the numbers properly, we could get a sequence of "random" numbers. How do we find the value of [math]a,b,[/math] and [math]m[/math]? At the very least [math]m[/math] should be a very large, preferably prime number. The larger [math]m[/math] is, the higher the possibility to get a sequence of "random" numbers. This is easier to solve in Matlab. In Matlab, the command rand() generates random numbers which are uniformly distributed on the interval (0,1)). Matlab uses [math]a=7^5, b=0, m=2^{31}-1[/math] – recommended in a 1988 paper, "Random Number Generators: Good Ones Are Hard To Find" by Stephen K. Park and Keith W. Miller (Important part is that [math]m[/math] should be large and prime)
Note: [math]\frac {x_{n+1}}{m-1}[/math] is an approximation to the value of a U(0,1) random variable.
MatLab Instruction for Multiplicative Congruential Algorithm:
Before you start, you need to clear all existing defined variables and operations:
>>clear all >>close all
>>a=17 >>b=3 >>m=31 >>x=5 >>mod(a*x+b,m) ans=26 >>x=mod(a*x+b,m)
(Note:
1. Keep repeating this command over and over again and you will get random numbers – this is how the command rand works in a computer.
2. There is a function in MATLAB called RAND to generate a random number between 0 and 1.
For example, in MATLAB, we can use rand(1,1000) to generate 1000's numbers between 0 and 1. This is essentially a vector with 1 row, 1000 columns, with each entry a random number between 0 and 1.
3. If we would like to generate 1000 or more numbers, we could use a for loop
(Note on MATLAB commands:
1. clear all: clears all variables.
2. close all: closes all figures.
3. who: displays all defined variables.
4. clc: clears screen.
5. ; : prevents the results from printing.
6. disstool: displays a graphing tool.
>>a=13 >>b=0 >>m=31 >>x(1)=1 >>for ii=2:1000 x(ii)=mod(a*x(ii-1)+b,m); end >>size(x) ans=1 1000 >>hist(x)
(Note: The semicolon after the x(ii)=mod(a*x(ii-1)+b,m) ensures that Matlab will not print the entire vector of x. It will instead calculate it internally and you will be able to work with it. Adding the semicolon to the end of this line reduces the run time significantly.)
This algorithm involves three integer parameters [math]a, b,[/math] and [math]m[/math] and an initial value, [math]x_0[/math] called the seed. A sequence of numbers is defined by [math]x_{k+1} = ax_k+ b \mod m[/math].
Note: For some bad [math]a[/math] and [math]b[/math], the histogram may not look uniformly distributed.
Note: In MATLAB, hist(x) will generate a graph representing the distribution. Use this function after you run the code to check the real sample distribution.
Example: [math]a=13, b=0, m=31[/math]
The first 30 numbers in the sequence are a permutation of integers from 1 to 30, and then the sequence repeats itself so it is important to choose [math]m[/math] large to decrease the probability of each number repeating itself too early. Values are between [math]0[/math] and [math]m-1[/math]. If the values are normalized by dividing by [math]m-1[/math], then the results are approximately numbers uniformly distributed in the interval [0,1]. There is only a finite number of values (30 possible values in this case). In MATLAB, you can use function "hist(x)" to see if it looks uniformly distributed. We saw that the values between 0-30 had the same frequency in the histogram, so we can conclude that they are uniformly distributed.
If [math]x_0=1[/math], then
- [math]x_{k+1} = 13x_{k}\mod{31}[/math]
So,
- [math]\begin{align} x_{0} &{}= 1 \\ x_{1} &{}= 13 \times 1 + 0 &{}\mod{31} = 13 \\ x_{2} &{}= 13 \times 13 + 0 &{}\mod{31} = 14 \\ x_{3} &{}= 13 \times 14 + 0 &{}\mod{31} =27 \\ \end{align}[/math]
etc.
For example, with [math]a = 3, b = 2, m = 4, x_0 = 1[/math], we have:
- [math]x_{k+1} = (3x_{k} + 2)\mod{4}[/math]
So,
- [math]\begin{align}
x_{0} &{}= 1 \\
x_{1} &{}= 3 \times 1 + 2 \mod{4} = 1 \\
x_{2} &{}= 3 \times 1 + 2 \mod{4} = 1 \\
\end{align}[/math]
Another Example, a =3, b =2, m = 5, x_0=1 etc.
FAQ:
1.Why is it 1 to 30 instead of 0 to 30 in the example above?
[math]b = 0[/math] so in order to have [math]x_k[/math] equal to 0, [math]x_{k-1}[/math] must be 0 (since [math]a=13[/math] is relatively prime to 31). However, the seed is 1. Hence, we will never observe 0 in the sequence.
Alternatively, {0} and {1,2,...,30} are two orbits of the left multiplication by 13 in the group [math]\Z_{31}[/math].
2.Will the number 31 ever appear?Is there a probability that a number never appears?
The number 31 will never appear. When you perform the operation [math]\mod m[/math], the largest possible answer that you could receive is [math]m-1[/math]. Whether or not a particular number in the range from 0 to [math]m - 1[/math] appears in the above algorithm will be dependent on the values chosen for [math]a, b[/math] and [math]m[/math].
Examples:[From Textbook]
[math]\text{If }x_0=3 \text{ and } x_n=(5x_{n-1}+7)\mod 200[/math], [math]\text{find }x_1,\cdots,x_{10}[/math].
Solution:
[math]\begin{align}
x_1 &{}= (5 \times 3+7) &{}\mod{200} &{}= 22 \\
x_2 &{}= 117 &{}\mod{200} &{}= 117 \\
x_3 &{}= 592 &{}\mod{200} &{}= 192 \\
x_4 &{}= 2967 &{}\mod{200} &{}= 167 \\
x_5 &{}= 14842 &{}\mod{200} &{}= 42 \\
x_6 &{}= 74217 &{}\mod{200} &{}= 17 \\
x_7 &{}= 371092 &{}\mod{200} &{}= 92 \\
x_8 &{}= 1855467 &{}\mod{200} &{}= 67 \\
x_9 &{}= 9277342 &{}\mod{200} &{}= 142 \\
x_{10} &{}= 46386717 &{}\mod{200} &{}= 117 \\
\end{align}[/math]
Comments:
Matlab code: a=5; b=7; m=200; x(1)=3; for ii=2:1000 x(ii)=mod(a*x(ii-1)+b,m); end size(x); hist(x)
Typically, it is good to choose [math]m[/math] such that [math]m[/math] is large, and [math]m[/math] is prime. Careful selection of parameters '[math]a[/math]' and '[math]b[/math]' also helps generate relatively "random" output values, where it is harder to identify patterns. For example, when we used a composite (non prime) number such as 40 for [math]m[/math], our results were not satisfactory in producing an output resembling a uniform distribution.
The computed values are between 0 and [math]m-1[/math]. If the values are normalized by dividing by [math]m-1[/math], their result is numbers uniformly distributed on the interval [math]\left[0,1\right][/math] (similar to computing from uniform distribution).
From the example shown above, if we want to create a large group of random numbers, it is better to have large, prime [math]m[/math] so that the generated random values will not repeat after several iterations. Note: the period for this example is 8: from '[math]x_2[/math]' to '[math]x_9[/math]'.
There has been a research on how to choose uniform sequence. Many programs give you the options to choose the seed. Sometimes the seed is chosen by CPU.
Theorem (extra knowledge)
Let c be a non-zero constant. Then for any seed x0, and LCG will have largest max. period if and only if
(i) m and c are coprime;
(ii) (a-1) is divisible by all prime factor of m;
(iii) if and only if m is divisible by 4, then a-1 is also divisible by 4.
We want our LCG to have a large cycle. We call a cycle with m element the maximal period. We can make it bigger by making m big and prime. Recall:any number you can think of can be broken into a factor of prime Define coprime:Two numbers X and Y, are coprime if they do not share any prime factors.
Example:
Xn=(15Xn-1 + 4) mod 7
(i) m=7 c=4 -> coprime;
(ii) a-1=14 and a-1 is divisible by 7;
(iii) dose not apply.
(The extra knowledge stops here)
In this part, I learned how to use R code to figure out the relationship between two integers division, and their remainder. And when we use R to calculate R with random variables for a range such as(1:1000),the graph of distribution is like uniform distribution.
Summary of Multiplicative Congruential Algorithm
Problem: generate Pseudo Random Numbers.
Plan:
- find integer: a b m(large prime) x_{0}(the seed) .
- [math]x_{k+1}=(ax_{k}+b)[/math]mod m
Matlab Instruction:
>>clear all >>close all >>a=17 >>b=3 >>m=31 >>x=5 >>mod(a*x+b,m) ans=26 >>x=mod(a*x+b,m)
Another algorithm for generating pseudo random numbers is the multiply with carry method. Its simplest form is similar to the linear congruential generator. They differs in that the parameter b changes in the MWC algorithm. It is as follows:
1.) x_{k+1} = ax_{k} + b_{k} mod m
2.) b_{k+1} = floor((ax_{k} + b_{k})/m)
3.) set k to k + 1 and go to step 1
Source
Inverse Transform Method
Now that we know how to generate random numbers, we use these values to sample form distributions such as exponential. However, to easily use this method, the probability distribution consumed must have a cumulative distribution function (cdf) [math]F[/math] with a tractable (that is, easily found) inverse [math]F^{-1}[/math].
Theorem:
If we want to generate the value of a discrete random variable X, we must generate a random number U, uniformly distributed over (0,1).
Let [math]F:\R \rightarrow \left[0,1\right][/math] be a cdf. If [math]U \sim U\left[0,1\right][/math], then the random variable given by [math]X:=F^{-1}\left(U\right)[/math]
follows the distribution function [math]F\left(\cdot\right)[/math],
where [math]F^{-1}\left(u\right):=\inf F^{-1}\big(\left[u,+\infty\right)\big) = \inf\{x\in\R | F\left(x\right) \geq u\}[/math] is the generalized inverse.
Note: [math]F[/math] need not be invertible everywhere on the real line, but if it is, then the generalized inverse is the same as the inverse in the usual case. We only need it to be invertible on the range of F(x), [0,1].
Proof of the theorem:
The generalized inverse satisfies the following:
- [math]P(X\leq x)[/math]
[math]= P(F^{-1}(U)\leq x)[/math] (since [math]X= F^{-1}(U)[/math] by the inverse method)
[math]= P((F(F^{-1}(U))\leq F(x))[/math] (since [math]F [/math] is monotonically increasing)
[math]= P(U\leq F(x)) [/math] (since [math] P(U\leq a)= a[/math] for [math]U \sim U(0,1), a \in [0,1][/math],
[math]= F(x) , \text{ where } 0 \leq F(x) \leq 1 [/math]
This is the c.d.f. of X.
That is [math]F^{-1}\left(u\right) \leq x \Leftrightarrow u \leq F\left(x\right)[/math]
Finally, [math]P(X \leq x) = P(F^{-1}(U) \leq x) = P(U \leq F(x)) = F(x)[/math], since [math]U[/math] is uniform on the unit interval.
This completes the proof.
Therefore, in order to generate a random variable X~F, it can generate U according to U(0,1) and then make the transformation x=[math] F^{-1}(U) [/math]
Note that we can apply the inverse on both sides in the proof of the inverse transform only if the pdf of X is monotonic. A monotonic function is one that is either increasing for all x, or decreasing for all x. Of course, this holds true for all CDFs, since they are monotonic by definition.
In short, what the theorem tells us is that we can use a random number [math] U from U(0,1) [/math] to randomly sample a point on the CDF of X, then apply the inverse of the CDF to map the given probability to its domain, which gives us the random variable X.
Example 1 - Exponential: [math] f(x) = \lambda e^{-\lambda x}[/math]
Calculate the CDF:
[math] F(x)= \int_0^x f(t) dt = \int_0^x \lambda e ^{-\lambda t}\ dt[/math]
[math] = \frac{\lambda}{-\lambda}\, e^{-\lambda t}\, | \underset{0}{x} [/math]
[math] = -e^{-\lambda x} + e^0 =1 - e^{- \lambda x} [/math]
Solve the inverse:
[math] y=1-e^{- \lambda x} \Rightarrow 1-y=e^{- \lambda x} \Rightarrow x=-\frac {ln(1-y)}{\lambda}[/math]
[math] y=-\frac {ln(1-x)}{\lambda} \Rightarrow F^{-1}(x)=-\frac {ln(1-x)}{\lambda}[/math]
Note that 1 − U is also uniform on (0, 1) and thus −log(1 − U) has the same distribution as −logU.
Steps:
Step 1: Draw U ~U[0,1];
Step 2: [math] x=\frac{-ln(U)}{\lambda} [/math]
EXAMPLE 2 Normal distribution
G(y)=P[Y<=y)
=P[-sqr (y) < z < sqr (y)) =integrate from -sqr(z) to Sqr(z) 1/sqr(2pi) e ^(-z^2/2) dz = 2 integrate from 0 to sqr(y) 1/sqr(2pi) e ^(-z^2/2) dz
its the cdf of Y=z^2
pdf g(y)= G'(y) pdf pf x^2 (1)
MatLab Code:
>>u=rand(1,1000); >>hist(u) # this will generate a fairly uniform diagram
#let λ=2 in this example; however, you can make another value for λ >>x=(-log(1-u))/2; >>size(x) #1000 in size >>figure >>hist(x) #exponential
Example 2 - Continuous Distribution:
[math] f(x) = \dfrac {\lambda } {2}e^{-\lambda \left| x-\theta \right| } for -\infty \lt X \lt \infty , \lambda \gt 0 [/math]
Calculate the CDF:
[math] F(x)= \frac{1}{2} e^{-\lambda (\theta - x)} , for \ x \le \theta [/math]
[math] F(x) = 1 - \frac{1}{2} e^{-\lambda (x - \theta)}, for \ x \gt \theta [/math]
Solve for the inverse:
[math]F^{-1}(x)= \theta + ln(2y)/\lambda, for \ 0 \le y \le 0.5[/math]
[math]F^{-1}(x)= \theta - ln(2(1-y))/\lambda, for \ 0.5 \lt y \le 1[/math]
Algorithm:
Steps:
Step 1: Draw U ~ U[0, 1];
Step 2: Compute [math]X = F^-1(U)[/math] i.e. [math]X = \theta + \frac {1}{\lambda} ln(2U)[/math] for U < 0.5 else [math]X = \theta -\frac {1}{\lambda} ln(2(1-U))[/math]
Example 3 - [math]F(x) = x^5[/math]:
Given a CDF of X: [math]F(x) = x^5[/math], transform U~U[0,1].
Sol:
Let [math]y=x^5[/math], solve for x: [math]x=y^\frac {1}{5}[/math]. Therefore, [math]F^{-1} (x) = x^\frac {1}{5}[/math]
Hence, to obtain a value of x from F(x), we first set 'u' as an uniform distribution, then obtain the inverse function of F(x), and set
[math]x= u^\frac{1}{5}[/math]
Algorithm:
Steps:
Step 1: Draw U ~ rand[0, 1];
Step 2: X=U^(1/5);
Example 4 - BETA(1,β):
Given u~U[0,1], generate x from BETA(1,β)
Solution:
[math]F(x)= 1-(1-x)^\beta[/math],
[math]u= 1-(1-x)^\beta[/math]
Solve for x:
[math](1-x)^\beta = 1-u[/math],
[math]1-x = (1-u)^\frac {1}{\beta}[/math],
[math]x = 1-(1-u)^\frac {1}{\beta}[/math]
let β=3, use Matlab to construct N=1000 observations from Beta(1,3)
MatLab Code:
>> u = rand(1,1000); x = 1-(1-u)^(1/3); >> hist(x,50) >> mean(x)
Example 5 - Estimating [math]\pi[/math]:
Let's use rand() and Monte Carlo Method to estimate [math]\pi[/math]
N= total number of points
N_{c} = total number of points inside the circle
Prob[(x,y) lies in the circle=[math]\frac {Area(circle)}{Area(square)}[/math]
If we take square of size 2, circle will have area =[math]\pi (\frac {2}{2})^2 =\pi[/math].
Thus [math]\pi= 4(\frac {N_c}{N})[/math]
For example, UNIF(a,b)
[math]y = F(x) = (x - a)/ (b - a) [/math] [math]x = (b - a ) * y + a[/math] [math]X = a + ( b - a) * U[/math]
where U is UNIF(0,1)
Limitations:
1. This method is flawed since not all functions are invertible or monotonic: generalized inverse is hard to work on.
2. It may be impractical since some CDF's and/or integrals are not easy to compute such as Gaussian distribution.
We learned how to prove the transformation from cdf to inverse cdf,and use the uniform distribution to obtain a value of x from F(x). We can also use uniform distribution in inverse method to determine other distributions. The probability of getting a point for a circle over the triangle is a closed uniform distribution, each point in the circle and over the triangle is almost the same. Then, we can look at the graph to determine what kind of distribution the graph resembles.
Probability Distribution Function Tool in MATLAB
disttool #shows different distributions
This command allows users to explore different types of distribution and see how the changes affect the parameters on the plot of either a CDF or PDF.
change the value of mu and sigma can change the graph skew side.
Class 3 - Tuesday, May 14
Recall the Inverse Transform Method
Let U~Unif(0,1),then the random variable X = F^{-1}(u) has distribution F.
To sample X with CDF F(x),
[math]1) U~ \sim~ Unif [0,1] [/math]
2) X = F^{-1}(u)
Note: CDF of a U(a,b) random variable is:
- [math] F(x)= \begin{cases} 0 & \text{for }x \lt a \\[8pt] \frac{x-a}{b-a} & \text{for }a \le x \lt b \\[8pt] 1 & \text{for }x \ge b \end{cases} [/math]
Thus, for [math] U [/math] ~ [math]U(0,1) [/math], we have [math]P(U\leq 1) = 1[/math] and [math]P(U\leq 1/2) = 1/2[/math].
More generally, we see that [math]P(U\leq a) = a[/math].
For this reason, we had [math]P(U\leq F(x)) = F(x)[/math].
Reminder:
This is only for uniform distribution [math] U~ \sim~ Unif [0,1] [/math]
[math] P (U \le 1) = 1 [/math]
[math] P (U \le 0.5) = 0.5 [/math]
[math] P (U \le a) = a [/math]
Note that on a single point there is no mass probability (i.e. [math]u[/math] <= 0.5, is the same as [math] u [/math] < 0.5) More formally, this is saying that [math] P(X = x) = F(x)- \lim_{s \to x^-}F(x)[/math] , which equals zero for any continuous random variable
Limitations of the Inverse Transform Method
Though this method is very easy to use and apply, it does have a major disadvantage/limitation:
- We need to find the inverse cdf [math] F^{-1}(\cdot) [/math]. In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x).
For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution.
In conclusion, we need to find another way of sampling from more complicated distributions
Discrete Case
The same technique can be used for discrete case. We want to generate a discrete random variable x, that has probability mass function:
- [math]\begin{align}P(X = x_i) &{}= p_i \end{align}[/math]
- [math]x_0 \leq x_1 \leq x_2 \dots \leq x_n[/math]
- [math]\sum p_i = 1[/math]
Algorithm for applying Inverse Transformation Method in Discrete Case (Procedure):
1. Define a probability mass function for [math]x_{i}[/math] where i = 1,....,k. Note: k could grow infinitely.
2. Generate a uniform random number U, [math] U~ \sim~ Unif [0,1] [/math]
3. If [math]U\leq p_{o}[/math], deliver [math]X = x_{o}[/math]
4. Else, if [math]U\leq p_{o} + p_{1} [/math], deliver [math]X = x_{1}[/math]
5. Repeat the process again till we reached to [math]U\leq p_{o} + p_{1} + ......+ p_{k}[/math], deliver [math]X = x_{k}[/math]
Note that after generating a random U, the value of X can be determined by finding the interval [math][F(x_{j-1}),F(x_{j})][/math] in which U lies.
In summary:
Generate a discrete r.v.x that has pmf:
P(X=xi)=Pi, x0<x1<x2<...
1. Draw U~U(0,1);
2. If F(x(i-1))<U<F(xi), x=xi.
Example 3.0:
Generate a random variable from the following probability function:
x | -2 | -1 | 0 | 1 | 2 |
f(x) | 0.1 | 0.5 | 0.07 | 0.03 | 0.3 |
Answer:
1. Gen U~U(0,1)
2. If U < 0.5 then output -1
else if U < 0.8 then output 2
else if U < 0.9 then output -2
else if U < 0.97 then output 0 else output 1
Example 3.1 (from class): (Coin Flipping Example)
We want to simulate a coin flip. We have U~U(0,1) and X = 0 or X = 1.
We can define the U function so that:
If [math]U\leq 0.5[/math], then X = 0
and if [math]0.5 \lt U\leq 1[/math], then X =1.
This allows the probability of Heads occurring to be 0.5 and is a good generator of a random coin flip.
[math] U~ \sim~ Unif [0,1] [/math]
- [math]\begin{align} P(X = 0) &{}= 0.5\\ P(X = 1) &{}= 0.5\\ \end{align}[/math]
The answer is:
- [math] x = \begin{cases} 0, & \text{if } U\leq 0.5 \\ 1, & \text{if } 0.5 \lt U \leq 1 \end{cases}[/math]
- Code
>>for ii=1:1000 u=rand; if u<0.5 x(ii)=0; else x(ii)=1; end end >>hist(x)
Note: The role of semi-colon in Matlab: Matlab will not print out the results if the line ends in a semi-colon and vice versa.
Example 3.2 (From class):
Suppose we have the following discrete distribution:
- [math]\begin{align} P(X = 0) &{}= 0.3 \\ P(X = 1) &{}= 0.2 \\ P(X = 2) &{}= 0.5 \end{align}[/math]
The cumulative distribution function (cdf) for this distribution is then:
- [math] F(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ 0.3, & \text{if } x \lt 1 \\ 0.5, & \text{if } x \lt 2 \\ 1, & \text{if } x \ge 2 \end{cases}[/math]
Then we can generate numbers from this distribution like this, given [math]U \sim~ Unif[0, 1][/math]:
- [math] x = \begin{cases} 0, & \text{if } U\leq 0.3 \\ 1, & \text{if } 0.3 \lt U \leq 0.5 \\ 2, & \text{if } 0.5 \lt U\leq 1 \end{cases}[/math]
"Procedure"
1. Draw U~u (0,1)
2. if U<=0.3 deliver x=0
3. else if 0.3<U<=0.5 deliver x=1
4. else 0.5<U<=1 deliver x=2
Can you find a faster way to run this algorithm? Consider:
- [math] x = \begin{cases} 2, & \text{if } U\leq 0.5 \\ 1, & \text{if } 0.5 \lt U \leq 0.7 \\ 0, & \text{if } 0.7 \lt U\leq 1 \end{cases}[/math]
The logic for this is that U is most likely to fall into the largest range. Thus by putting the largest range (in this case x >= 0.5) we can improve the run time of this algorithm. Could this algorithm be improved further using the same logic?
- Code (as shown in class)
Use Editor window to edit the code
>>close all >>clear all >>for ii=1:1000 u=rand; if u<=0.3 x(ii)=0; elseif u<=0.5 x(ii)=1; else x(ii)=2; end end >>size(x) >>hist(x)
The algorithm above generates a vector (1,1000) containing 0's ,1's and 2's in differing proportions. Due to the criteria for accepting 0, 1 or 2 into the vector we get proportions of 0,1 &2 that correspond to their respective probabilities. So plotting the histogram (frequency of 0,1&2) doesn't give us the pmf but a frequency histogram that shows the proportions of each, which looks identical to the pmf.
Example 3.3: Generating a random variable from pdf
- [math] f_{x}(x) = \begin{cases} 2x, & \text{if } 0\leq x \leq 1 \\ 0, & \text{if } otherwise \end{cases}[/math]
- [math] F_{x}(x) = \begin{cases} 0, & \text{if } x \lt 0 \\ \int_{0}^{x}2sds = x^{2}, & \text{if } 0\leq x \leq 1 \\ 1, & \text{if } x \gt 1 \end{cases}[/math]
- [math]\begin{align} U = x^{2}, X = F^{-1}x(U)= U^{\frac{1}{2}}\end{align}[/math]
Example 3.4: Generating a Bernoulli random variable
- [math]\begin{align} P(X = 1) = p, P(X = 0) = 1 - p\end{align}[/math]
- [math] F(x) = \begin{cases} 1-p, & \text{if } x \lt 1 \\ 1, & \text{if } x \ge 1 \end{cases}[/math]
1. Draw [math] U~ \sim~ Unif [0,1] [/math]
2. [math]
X = \begin{cases}
0, & \text{if } 0 \lt U \lt 1-p \\
1, & \text{if } 1-p \le U \lt 1
\end{cases}[/math]
Example 3.5: Generating Binomial(n,p) Random Variable
[math] use p\left( x=i+1\right) =\dfrac {n-i} {i+1}\dfrac {p} {1-p}p\left( x=i\right) [/math]
Step 1: Generate a random number [math]U[/math].
Step 2: [math]c = \frac {p}{(1-p)}[/math], [math]i = 0[/math], [math]pr = (1-p)^n[/math], [math]F = pr[/math]
Step 3: If U<F, set X = i and stop,
Step 4: [math] pr = \, {\frac {c(n-i)}{(i+1)}} {pr}, F = F +pr, i = i+1[/math]
Step 5: Go to step 3
- Note: These steps can be found in Simulation 5th Ed. by Sheldon Ross.
- Note: Another method by seeing the Binomial as a sum of n independent Bernoulli random variables, U1, ..., Un. Then set X equal to the number of Ui that are less than or equal to p. To use this method, n random numbers are needed and n comparisons need to be done. On the other hand, the inverse transformation method is simpler because only one random variable needs to be generated and it makes 1 + np comparisons.
Step 1: Generate n uniform numbers U1 ... Un.
Step 2: X = [math]\sum U_i \lt = p[/math] where P is the probability of success.
Example 3.6: Generating a Poisson random variable
"Let X ~ Poi(u). Write an algorithm to generate X. The PDF of a poisson is:
- [math]\begin{align} f(x) = \frac {\, e^{-u} u^x}{x!} \end{align}[/math]
We know that
- [math]\begin{align} P_{x+1} = \frac {\, e^{-u} u^{x+1}}{(x+1)!} \end{align}[/math]
The ratio is [math]\begin{align} \frac {P_{x+1}}{P_x} = ... = \frac {u}{{x+1}} \end{align}[/math] Therefore, [math]\begin{align} P_{x+1} = \, {\frac {u}{x+1}} P_x\end{align}[/math]
Algorithm:
1) Generate U ~ U(0,1)
2) [math]\begin{align} X = 0 \end{align}[/math]
[math]\begin{align} F = P(X = 0) = e^{-u}*u^0/{0!} = e^{-u} = p \end{align}[/math]
3) If U<F, output x
Else, [math]\begin{align} p = (u/(x+1))^p \end{align}[/math]
[math]\begin{align} F = F + p \end{align}[/math]
[math]\begin{align} x = x + 1 \end{align}[/math]
4) Go to 1"
Acknowledgements: This is an example from Stat 340 Winter 2013
Example 3.7: Generating Geometric Distribution:
Consider Geo(p) where p is the probability of success, and define random variable X such that X is the total number of trials required to achieve the first success. x=1,2,3..... We have pmf: [math]P(X=x_i) = \, p (1-p)^{x_{i}-1}[/math] We have CDF: [math]F(x)=P(X \leq x)=1-P(X\gt x) = 1-(1-p)^x[/math], P(X>x) means we get at least x failures before we observe the first success. Now consider the inverse transform:
- [math] x = \begin{cases} 1, & \text{if } U\leq p \\ 2, & \text{if } p \lt U \leq 1-(1-p)^2 \\ 3, & \text{if } 1-(1-p)^2 \lt U\leq 1-(1-p)^3 \\ .... k, & \text{if } 1-(1-p)^{k-1} \lt U\leq 1-(1-p)^k .... \end{cases}[/math]
Note: Unlike the continuous case, the discrete inverse-transform method can always be used for any discrete distribution (but it may not be the most efficient approach)
General Procedure
1. Draw U ~ U(0,1)
2. If [math]U \leq P_{0}[/math] deliver [math]x = x_{0}[/math]
3. Else if [math]U \leq P_{0} + P_{1}[/math] deliver [math]x = x_{1}[/math]
4. Else if [math]U \leq P_{0} + P_{1} + P_{2} [/math] deliver [math]x = x_{2}[/math]
...
Else if [math]U \leq P_{0} + ... + P_{k} [/math] deliver [math]x = x_{k}[/math]
===Inverse Transform Algorithm for Generating a Binomial(n,p) Random Variable(from textbook)===
step 1: Generate a random number U
step 2: c=p/(1-p),i=0, pr=(1-p)^{n}, F=pr.
step 3: If U<F, set X=i and stop.
step 4: pr =[c(n-i)/(i+1)]pr, F=F+pr, i=i+1.
step 5: Go to step 3.
Problems
Though this method is very easy to use and apply, it does have a major disadvantage/limitation:
We need to find the inverse cdf F^{-1}(\cdot) . In some cases the inverse function does not exist, or is difficult to find because it requires a closed form expression for F(x).
For example, it is too difficult to find the inverse cdf of the Gaussian distribution, so we must find another method to sample from the Gaussian distribution.
In conclusion, we need to find another way of sampling from more complicated distributions
Flipping a coin is a discrete case of uniform distribution, and the code below shows an example of flipping a coin 1000 times; the result is close to the expected value 0.5.
Example 2, as another discrete distribution, shows that we can sample from parts like 0,1 and 2, and the probability of each part or each trial is the same.
Example 3 uses inverse method to figure out the probability range of each random varible.
Summary of Inverse Transform Method
Problem:generate types of distribution.
Plan:
Continuous case:
- find CDF F
- find the inverse F^{-1}
- Generate a list of uniformly distributed number {x}
- {F^{-1}(x)} is what we want
Matlab Instruction
>>u=rand(1,1000); >>hist(u) >>x=(-log(1-u))/2; >>size(x) >>figure >>hist(x)
Discrete case:
- generate a list of uniformly distributed number {u}
- d_{i}=x_{i} if[math] X=x_i, [/math] if [math] F(x_{i-1})\lt U\leq F(x_i) [/math]
- {d_{i}=x_{i}} is what we want
Matlab Instruction
>>for ii=1:1000 u=rand; if u<0.5 x(ii)=0; else x(ii)=1; end end >>hist(x)
Generalized Inverse-Transform Method
Valid for any CDF F(x): return X=min{x:F(x)[math]\leq[/math] U}, where U~U(0,1)
1. Continues, possibly with flat spots (i.e. not strictly increasing)
2. Discrete
3. Mixed continues discrete
Advantages of Inverse-Transform Method
Inverse transform method preserves monotonicity and correlation
which helps in
1. Variance reduction methods ...
2. Generating truncated distributions ...
3. Order statistics ...
Acceptance-Rejection Method
Although the inverse transformation method does allow us to change our uniform distribution, it has two limits;
- Not all functions have inverse functions (ie, the range of x and y have limit and do not fix the inverse functions)
- For some distributions, such as Gaussian, it is too difficult to find the inverse
To generate random samples for these functions, we will use different methods, such as the Acceptance-Rejection Method. This method is more efficient than the inverse transform method. The basic idea is to find an alternative probability distribution with density function f(x);
Suppose we want to draw random sample from a target density function f(x), x∈S_{x}, where S_{x} is the support of f(x). If we can find some constant c(≥1) (In practice, we prefer c as close to 1 as possible) and a density function g(x) having the same support S_{x} so that f(x)≤cg(x), ∀x∈S_{x}, then we can apply the procedure for Acceptance-Rejection Method. Typically we choose a density function that we already know how to sample from for g(x).
The main logic behind the Acceptance-Rejection Method is that:
1. We want to generate sample points from an unknown distribution, say f(x).
2. We use [math]\,cg(x)[/math] to generate points so that we have more points than f(x) could ever generate for all x. (where c is a constant, and g(x) is a known distribution)
3. For each value of x, we accept and reject some points based on a probability, which will be discussed below.
Note: If the red line was only g(x) as opposed to [math]\,c g(x)[/math] (i.e. c=1), then [math]g(x) \geq f(x)[/math] for all values of x if and only if g and f are the same functions. This is because the sum of pdf of g(x)=1 and the sum of pdf of f(x)=1, hence, [math]g(x) \ngeqq f(x)[/math] \,∀x.
Also remember that [math]\,c g(x)[/math] always generates higher probability than what we need. Thus we need an approach of getting the proper probabilities.
c must be chosen so that [math]f(x)\leqslant c g(x)[/math] for all value of x. c can only equal 1 when f and g have the same distribution. Otherwise:
Either use a software package to test if [math]f(x)\leqslant c g(x)[/math] for an arbitrarily chosen c > 0, or:
1. Find first and second derivatives of f(x) and g(x).
2. Identify and classify all local and absolute maximums and minimums, using the First and Second Derivative Tests, as well as all inflection points.
3. Verify that [math]f(x)\leqslant c g(x)[/math] at all the local maximums as well as the absolute maximums.
4. Verify that [math]f(x)\leqslant c g(x)[/math] at the tail ends by calculating [math]\lim_{x \to +\infty} \frac{f(x)}{\, c g(x)}[/math] and [math]\lim_{x \to -\infty} \frac{f(x)}{\, c g(x)}[/math] and seeing that they are both < 1. Use of L'Hopital's Rule should make this easy, since both f and g are p.d.f's, resulting in both of them approaching 0.
5.Efficiency: the number of times N that steps 1 and 2 need to be called(also the number of iterations needed to successfully generate X) is a random variable and has a geometric distribution with success probability [math]p=P(U \leq f(Y)/(cg(Y)))[/math] , [math]P(N=n)=(1-p(n-1))p ,n \geq 1[/math].Thus on average the number of iterations required is given by [math] E(N)=\frac{1} p[/math]
c should be close to the maximum of f(x)/g(x), not just some arbitrarily picked large number. Otherwise, the Acceptance-Rejection method will have more rejections (since our probability [math]f(x)\leqslant c g(x)[/math] will be close to zero). This will render our algorithm inefficient.
The expected number of iterations of the algorithm required with an X is c.
Note:
1. Value around x_{1} will be sampled more often under cg(x) than under f(x).There will be more samples than we actually need, if [math]\frac{f(y)}{\, c g(y)}[/math] is small, the acceptance-rejection technique will need to be done to these points to get the accurate amount.In the region above x_{1}, we should accept less and reject more.
2. Value around x_{2}: number of sample that are drawn and the number we need are much closer. So in the region above x_{2}, we accept more. As a result, g(x) and f(x) are comparable.
3. The constant c is needed because we need to adjust the height of g(x) to ensure that it is above f(x). Besides that, it is best to keep the number of rejected varieties small for maximum efficiency.
Another way to understand why the the acceptance probability is [math]\frac{f(y)}{\, c g(y)}[/math], is by thinking of areas. From the graph above, we see that the target function in under the proposed function c g(y). Therefore, [math]\frac{f(y)}{\, c g(y)}[/math] is the proportion or the area under c g(y) that also contains f(y). Therefore we say we accept sample points for which u is less then [math]\frac{f(y)}{\, c g(y)}[/math] because then the sample points are guaranteed to fall under the area of c g(y) that contains f(y).
There are 2 cases that are possible:
-Sample of points is more than enough, [math]c g(x) \geq f(x) [/math]
-Similar or the same amount of points, [math]c g(x) \geq f(x) [/math]
There is 1 case that is not possible:
-Less than enough points, such that [math] g(x) [/math] is greater than [math] f [/math], [math]g(x) \geq f(x)[/math]
Procedure
- Draw Y~g(.)
- Draw U~u(0,1) (Note: U and Y are independent)
- If [math]u\leq \frac{f(y)}{cg(y)}[/math] (which is [math]P(accepted|y)[/math]) then x=y, else return to Step 1
Note: Recall [math]P(U\leq a)=a[/math]. Thus by comparing u and [math]\frac{f(y)}{\, c g(y)}[/math], we can get a probability of accepting y at these points. For instance, at some points that cg(x) is much larger than f(x), the probability of accepting x=y is quite small.
ie. At X_{1}, low probability to accept the point since f(x) is much smaller than cg(x).
At X_{2}, high probability to accept the point. [math]P(U\leq a)=a[/math] in Uniform Distribution.
Note: Since U is the variable for uniform distribution between 0 and 1. It equals to 1 for all. The condition depends on the constant c. so the condition changes to [math]c\leq \frac{f(y)}{g(y)}[/math]
introduce the relationship of cg(x)and f(x),and prove why they have that relationship and where we can use this rule to reject some cases.
and learn how to see the graph to find the accurate point to reject or accept the ragion above the random variable x.
for the example, x1 is bad point and x2 is good point to estimate the rejection and acceptance
Some notes on the constant C
1. C is chosen such that [math] c g(y)\geq f(y)[/math], that is,[math] c g(y)[/math] will always dominate [math]f(y)[/math]. Because of this,
C will always be greater than or equal to one and will only equal to one if and only if the proposal distribution and the target distribution are the same. It is normally best to choose C such that the absolute maxima of both [math] c g(y)[/math] and [math] f(y)[/math] are the same.
2. [math] \frac {1}{C} [/math] is the area of [math] F(y)[/math] over the area of [math] c G(y)[/math] and is the acceptance rate of the points generated. For example, if [math] \frac {1}{C} = 0.7[/math] then on average, 70 percent of all points generated are accepted.
3. C is the average number of times Y is generated from g .
Theorem
Let [math]f: \R \rightarrow [0,+\infty][/math] be a well-defined pdf, and [math]\displaystyle Y[/math] be a random variable with pdf [math]g: \R \rightarrow [0,+\infty][/math] such that [math]\exists c \in \R^+[/math] with [math]f \leq c \cdot g[/math]. If [math]\displaystyle U \sim~ U(0,1)[/math] is independent of [math]\displaystyle Y[/math], then the random variable defined as [math]X := Y \vert U \leq \frac{f(Y)}{c \cdot g(Y)}[/math] has pdf [math]\displaystyle f[/math], and the condition [math]U \leq \frac{f(Y)}{c \cdot g(Y)}[/math] is denoted by "Accepted".
Proof
Recall the conditional probability formulas:
[math]\begin{align}
P(A|B)=\frac{P(A \cap B)}{P(B)}, \text{ or }P(A|B)=\frac{P(B|A)P(A)}{P(B)} \text{ for pmf}
\end{align}[/math]
[math]P(y|accepted)=f(y)=\frac{P(accepted|y)P(y)}{P(accepted)}[/math]
based on the concept from procedure-step1:
[math]P(y)=g(y)[/math]
[math]P(accepted|y)=\frac{f(y)}{cg(y)}[/math]
(the larger the value is, the larger the chance it will be selected)
[math]
\begin{align}
P(accepted)&=\int_y\ P(accepted|y)P(y)\\
&=\int_y\ \frac{f(s)}{cg(s)}g(s)ds\\
&=\frac{1}{c} \int_y\ f(s) ds\\
&=\frac{1}{c}
\end{align}[/math]
Therefore:
[math]\begin{align}
P(x)&=P(y|accepted)\\
&=\frac{\frac{f(y)}{cg(y)}g(y)}{1/c}\\
&=\frac{\frac{f(y)}{c}}{1/c}\\
&=f(y)\end{align}[/math]
Here is an alternative introduction of Acceptance-Rejection Method
Comments:
-Acceptance-Rejection Method is not good for all cases. The limitation with this method is that sometimes many points will be rejected. One obvious disadvantage is that it could be very hard to pick the [math]g(y)[/math] and the constant [math]c[/math] in some cases. We have to pick the SMALLEST C such that [math]cg(x) \leq f(x)[/math] else the the algorithm will not be efficient. This is because [math]f(x)/cg(x)[/math] will become smaller and probability [math]u \leq f(x)/cg(x)[/math] will go down and many points will be rejected making the algorithm inefficient.
-Note: When [math]f(y)[/math] is very different than [math]g(y)[/math], it is less likely that the point will be accepted as the ratio above would be very small and it will be difficult for [math]U[/math] to be less than this small value.
An example would be when the target function ([math]f[/math]) has a spike or several spikes in its domain - this would force the known distribution ([math]g[/math]) to have density at least as large as the spikes, making the value of [math]c[/math] larger than desired. As a result, the algorithm would be highly inefficient.
Acceptance-Rejection Method
Example 1 (discrete case)
We wish to generate X~Bi(2,0.5), assuming that we cannot generate this directly.
We use a discrete distribution DU[0,2] to approximate this.
[math]f(x)=Pr(X=x)=2Cx×(0.5)^2\,[/math]
[math]x[/math] | 0 | 1 | 2 |
[math]f(x)[/math] | 1/4 | 1/2 | 1/4 |
[math]g(x)[/math] | 1/3 | 1/3 | 1/3 |
[math]c=f(x)/g(x)[/math] | 3/4 | 3/2 | 3/4 |
[math]f(x)/(cg(x))[/math] | 1/2 | 1 | 1/2 |
Since we need [math]c \geq f(x)/g(x)[/math]
We need [math]c=3/2[/math]
Therefore, the algorithm is:
1. Generate [math]u,v~U(0,1)[/math]
2. Set [math]y= \lfloor 3*u \rfloor[/math] (This is using uniform distribution to generate DU[0,2]
3. If [math](y=0)[/math] and [math](v\lt \tfrac{1}{2}), output=0[/math]
If [math](y=2) [/math] and [math](v\lt \tfrac{1}{2}), output=2 [/math]
Else if [math]y=1, output=1[/math]
An elaboration of “c”
c is the expected number of times the code runs to output 1 random variable. Remember that when [math]u \lt \tfrac{f(x)}{cg(x)}[/math] is not satisfied, we need to go over the code again.
Proof
Let [math]f(x)[/math] be the function we wish to generate from, but we cannot use inverse transform method to generate directly.
Let [math]g(x)[/math] be the helper function
Let [math]kg(x)\gt =f(x)[/math]
Since we need to generate y from [math]g(x)[/math],
[math]Pr(select y)=g(y)[/math]
[math]Pr(output y|selected y)=Pr(u\lt f(y)/(cg(y)))= f(y)/(cg(y))[/math] (Since u~Unif(0,1))
[math]Pr(output y)=Pr(output y1|selected y1)Pr(select y1)+ Pr(output y2|selected y2)Pr(select y2)+…+ Pr(output yn|selected yn)Pr(select yn)=1/c[/math]
Consider that we are asking for expected time for the first success, it is a geometric distribution with probability of success=1/c
Therefore, [math]E(X)=1/(1/c))=c[/math]
Acknowledgements: Some materials have been borrowed from notes from Stat340 in Winter 2013.
Use the conditional probability to proof if the probability is accepted, then the result is closed pdf of the original one. the example shows how to choose the c for the two function [math]g(x)[/math] and [math]f(x)[/math].
Example of Acceptance-Rejection Method
Generating a random variable having p.d.f.
[math]\displaystyle f(x) = 20x(1 - x)^3, 0\lt x \lt 1 [/math]
Since this random variable (which is beta with parameters (2,4)) is concentrated in the interval (0, 1), let us consider the acceptance-rejection method with
[math]\displaystyle g(x) = 1,0\lt x\lt 1[/math]
To determine the constant c such that f(x)/g(x) <= c, we use calculus to determine the maximum value of
[math]\displaystyle f(x)/g(x) = 20x(1 - x)^3 [/math]
Differentiation of this quantity yields
[math]\displaystyle d/dx[f(x)/g(x)]=20*[(1-x)^3-3x(1-x)^2][/math]
Setting this equal to 0 shows that the maximal value is attained when x = 1/4,
and thus,
[math]\displaystyle f(x)/g(x)\lt = 20*(1/4)*(3/4)^3=135/64=c [/math]
Hence,
[math]\displaystyle f(x)/cg(x)=(256/27)*(x*(1-x)^3)[/math]
and thus the simulation procedure is as follows:
1) Generate two random numbers U1 and U2 .
2) If U_{2}<(256/27)*U_{1}*(1-U_{1})^{3}, set X=U_{1}, and stop Otherwise return to step 1). The average number of times that step 1) will be performed is c = 135/64.
(The above example is from http://www.cs.bgu.ac.il/~mps042/acceptance.htm, example 2.)
use the derivative to proof the accepetance-rejection method, find the local maximum of f(x)/g(x). and we can calculate the best constant c.
Another Example of Acceptance-Rejection Method
Generate a random variable from:
[math]\displaystyle f(x)=3*x^2, 0\lt x\lt 1 [/math]
Assume g(x) to be uniform over interval (0,1), where 0< x <1
Therefore:
[math]\displaystyle c = max(f(x)/(g(x)))= 3[/math]
the best constant c is the max(f(x)/(cg(x))) and the c make the area above the f(x) and below the g(x) to be small.
because g(.) is uniform so the g(x) is 1. max(g(x)) is 1
[math]\displaystyle f(x)/(cg(x))= x^2[/math]
Acknowledgement: this is example 1 from http://www.cs.bgu.ac.il/~mps042/acceptance.htm
Class 4 - Thursday, May 16
Goals
- When we want to find target distribution [math]f(x)[/math], we need to first find a proposal distribution [math]g(x)[/math] that is easy to sample from.
- Relationship between the proposal distribution and target distribution is: [math] c \cdot g(x) \geq f(x) [/math], where c is constant. This means that the area of f(x) is under the area of [math] c \cdot g(x)[/math].
- Chance of acceptance is less if the distance between [math]f(x)[/math] and [math] c \cdot g(x)[/math] is big, and vice-versa, we use [math] c [/math] to keep [math] \frac {f(x)}{c \cdot g(x)} [/math] below 1 (so [math]f(x) \leq c \cdot g(x)[/math]). Therefore, we must find the constant [math] C [/math] to achieve this.
- In other words, [math]C[/math] is chosen to make sure [math] c \cdot g(x) \geq f(x) [/math]. However, it will not make sense if [math]C[/math] is simply chosen to be arbitrarily large. We need to choose [math]C[/math] such that [math]c \cdot g(x)[/math] fits [math]f(x)[/math] as tightly as possible. This means that we must find the minimum c such that the area of f(x) is under the area of c*g(x).
- The constant c cannot be a negative number.
How to find C:
[math]\begin{align}
&c \cdot g(x) \geq f(x)\\
&c\geq \frac{f(x)}{g(x)} \\
&c= \max \left(\frac{f(x)}{g(x)}\right)
\end{align}[/math]
If [math]f[/math] and [math] g [/math] are continuous, we can find the extremum by taking the derivative and solve for [math]x_0[/math] such that:
[math] 0=\frac{d}{dx}\frac{f(x)}{g(x)}|_{x=x_0}[/math]
Thus [math] c = \frac{f(x_0)}{g(x_0)} [/math]
Note: This procedure is called the Acceptance-Rejection Method.
The Acceptance-Rejection method involves finding a distribution that we know how to sample from, g(x), and multiplying g(x) by a constant c so that [math]c \cdot g(x)[/math] is always greater than or equal to f(x). Mathematically, we want [math] c \cdot g(x) \geq f(x) [/math].
And it means, c has to be greater or equal to [math]\frac{f(x)}{g(x)}[/math]. So the smallest possible c that satisfies the condition is the maximum value of [math]\frac{f(x)}{g(x)}[/math]
.
But in case of c being too large, the chance of acceptance of generated values will be small, thereby losing efficiency of the algorithm. Therefore, it is best to get the smallest possible c such that [math] c g(x) \geq f(x)[/math].
Important points:
- For this method to be efficient, the constant c must be selected so that the rejection rate is low. (The efficiency for this method is [math]\left ( \frac{1}{c} \right )[/math])
- It is easy to show that the expected number of trials for an acceptance is [math] \frac{Total Number of Trials} {C} [/math].
- recall the acceptance rate is 1/c. (Not rejection rate)
- Let [math]X[/math] be the number of trials for an acceptance, [math] X \sim~ Geo(\frac{1}{c})[/math]
- [math]\mathbb{E}[X] = \frac{1}{\frac{1}{c}} = c [/math]
- The number of trials needed to generate a sample size of [math]N[/math] follows a negative binomial distribution. The expected number of trials needed is then [math]cN[/math].
- So far, the only distribution we know how to sample from is the UNIFORM distribution.
Procedure:
1. Choose [math]g(x)[/math] (simple density function that we know how to sample, i.e. Uniform so far)
The easiest case is [math]U~ \sim~ Unif [0,1] [/math]. However, in other cases we need to generate UNIF(a,b). We may need to perform a linear transformation on the [math]U~ \sim~ Unif [0,1] [/math] variable.
2. Find a constant c such that :[math] c \cdot g(x) \geq f(x) [/math], otherwise return to step 1.
Recall the general procedure of Acceptance-Rejection Method
- Let [math]Y \sim~ g(y)[/math]
- Let [math]U \sim~ Unif [0,1] [/math]
- If [math]U \leq \frac{f(Y)}{c \cdot g(Y)}[/math] then X=Y; else return to step 1 (This is not the way to find C. This is the general procedure.)
Example:
Generate a random variable from the pdf
[math] f(x) =
\begin{cases}
2x, & \mbox{if }0 \leqslant x \leqslant 1 \\
0, & \mbox{otherwise}
\end{cases} [/math]
We can note that this is a special case of Beta(2,1), where,
[math]beta(a,b)=\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{(a-1)}(1-x)^{(b-1)}[/math]
Where Γ (n) = (n - 1)! if n is positive integer
[math]Gamma(z)=\int _{0}^{\infty }t^{z-1}e^{-t}dt[/math]
Aside: Beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function defined by
[math]B(x,y)=\int_0^1 \! {t^{(x-1)}}{(1-t)^{(y-1)}}\,dt[/math]
[math]beta(2,1)= \frac{\Gamma(3)}{(\Gamma(2)\Gamma(1))}x^1 (1-x)^0 = 2x[/math]
[math]g=u(0,1)[/math]
[math]y=g[/math]
[math]f(x)\leq c\cdot g(x)[/math]
[math]c\geq \frac{f(x)}{g(x)}[/math]
[math]c = \max \frac{f(x)}{g(x)} [/math]
[math]c = \max \frac{2x}{1}, 0 \leq x \leq 1[/math]
Taking x = 1 gives the highest possible c, which is c=2
Note that c is a scalar greater than 1.
cg(x) is proposal dist, and f(x) is target dist.
Note: g follows uniform distribution, it only covers half of the graph which runs from 0 to 1 on y-axis. Thus we need to multiply by c to ensure that [math]c\cdot g[/math] can cover entire f(x) area. In this case, c=2, so that makes g run from 0 to 2 on y-axis which covers f(x).
Comment:
From the picture above, we could observe that the area under f(x)=2x is a half of the area under the pdf of UNIF(0,1). This is why in order to sample 1000 points of f(x), we need to sample approximately 2000 points in UNIF(0,1).
And in general, if we want to sample n points from a distritubion with pdf f(x), we need to scan approximately [math]n\cdot c[/math] points from the proposal distribution (g(x)) in total.
Step
- Draw y~U(0,1)
- Draw u~U(0,1)
- if [math]u \leq \frac{(2\cdot y)}{(2\cdot 1)}, u \leq y,[/math] then [math] x=y[/math]
- Else go to Step 1
Note: In the above example, we sample 2 numbers. If second number (u) is less than or equal to first number (y), then accept x=y, if not then start all over.
Matlab Code
>>close all >>clear all >>ii=1; # ii:numbers that are accepted >>jj=1; # jj:numbers that are generated >>while ii<1000 y=rand; u=rand; jj=jj+1; if u<=y x(ii)=y; ii=ii+1; end end >>hist(x) # It is a histogram >>jj jj = 2024 # should be around 2000
- *Note: The reason that a for loop is not used is that we need to continue the looping until we get 1000 successful samples. We will reject some samples during the process and therefore do not know the number of y we are going to generate.
- *Note2: In this example, we used c=2, which means we accept half of the points we generate on average. Generally speaking, 1/c would be the probability of acceptance, and an indicator of the efficiency of your chosen proposal distribution and algorithm.
- *Note3: We use while instead of for when looping because we do not know how many iterations are required to generate 1000 successful samples. We can view this as a negative binomial distribution so while the expected number of iterations required is n * c, it will likely deviate from this amount. We expect 2000 in this case.
- *Note4: If c=1, we will accept all points, which is the ideal situation. However, this is essentially impossible because if c = 1 then our distributions f(x) and g(x) must be identical, so we will have to be satisfied with as close to 1 as possible.
Use Inverse Method for this Example
- [math]F(x)=\int_0^x \! 2s\,ds={x^2}-0={x^2}[/math]
- [math]y=x^2[/math]
- [math]x=\sqrt y[/math]
- [math] F^{-1}\left (\, x \, \right) =\sqrt x[/math]
- Procedure
- 1: Draw [math] U~ \sim~ Unif [0,1] [/math]
- 2: [math] x=F^{-1}\left (\, u\, \right) =\sqrt u[/math]
Matlab Code
>>u=rand(1,1000); >>x=u.^0.5; >>hist(x)
Matlab Tip: Periods, ".",meaning "element-wise", are used to describe the operation you want performed on each element of a vector. In the above example, to take the square root of every element in U, the notation U.^0.5 is used. However if you want to take the square root of the entire matrix U the period, "." would be excluded. i.e. Let matrix B=U^0.5, then [math]B^T*B=U[/math]. For example if we have a two 1 X 3 matrices and we want to find out their product; using "." in the code will give us their product. However, if we don't use ".", it will just give us an error. For example, a =[1 2 3] b=[2 3 4] are vectors, a.*b=[2 6 12], but a*b does not work since the matrix dimensions must agree.
Example for A-R method:
Given [math] f(x)= \frac{3}{4} (1-x^2), -1 \leq x \leq 1 [/math], use A-R method to generate random number
Let g=U(-1,1) and g(x)=1/2
let y ~ f, [math] cg(x)\geq f(x), c\frac{1}{2} \geq \frac{3}{4} (1-x^2) /1, c=max 2\cdot\frac{3}{4} (1-x^2) = 3/2 [/math]
The process:
- 1: Draw U1 ~ U(0,1)
- 2: Draw U2 ~ U(0,1)
- 3: let [math] y = U1*2 - 1 [/math]
- 4: if [math]U2 \leq \frac { \frac{3}{4} * (1-y^2)} { \frac{3}{4}} = {1-y^2}[/math], then x=y, note that (3/4(1-y^2)/(3/4) is getting from f(y) / (cg(y)) )
- 5: else: return to step 1
Example of Acceptance-Rejection Method
[math]\begin{align} & f(x) = 3x^2, 0\lt x\lt 1 \\ \end{align}[/math]<br\>
[math]\begin{align} & g(x)=1, 0\lt x\lt 1 \\ \end{align}[/math]<br\>
[math]c = \max \frac{f(x)}{g(x)} = \max \frac{3x^2}{1} = 3 [/math]
[math]\frac{f(x)}{c \cdot g(x)} = x^2[/math]
1. Generate two uniform numbers in the unit interval [math]U_1, U_2 \sim~ U(0,1)[/math]
2. If [math]U_2 \leqslant {U_1}^2[/math], accept [math]\begin{align}U_1\end{align}[/math] as the random variable with pdf [math]\begin{align}f\end{align}[/math], if not return to Step 1
We can also use [math]\begin{align}g(x)=2x\end{align}[/math] for a more efficient algorithm
[math]c = \max \frac{f(x)}{g(x)} = \max \frac {3x^2}{2x} = \frac {3x}{2} [/math]. Use the inverse method to sample from [math]\begin{align}g(x)\end{align}[/math] [math]\begin{align}G(x)=x^2\end{align}[/math]. Generate [math]\begin{align}U\end{align}[/math] from [math]\begin{align}U(0,1)\end{align}[/math] and set [math]\begin{align}x=sqrt(u)\end{align}[/math]
1. Generate two uniform numbers in the unit interval [math]U_1, U_2 \sim~ U(0,1)[/math]
2. If [math]U_2 \leq \frac{3\sqrt{U_1}}{2}[/math], accept [math]U_1[/math] as the random variable with pdf [math]f[/math], if not return to Step 1
- Note :the function [math]\begin{align}q(x) = c * g(x)\end{align}[/math] is called an envelop or majoring function.
To obtain a better proposing function [math]\begin{align}g(x)\end{align}[/math], we can first assume a new [math]\begin{align}q(x)\end{align}[/math] and then solve for the normalizing constant by integrating.
In the previous example, we first assume [math]\begin{align}q(x) = 3x\end{align}[/math]. To find the normalizing constant, we need to solve [math]k *\sum 3x = 1[/math] which gives us k = 2/3. So,[math]\begin{align}g(x) = k*q(x) = 2x\end{align}[/math].
Possible Limitations
-This method could be computationally inefficient depending on the rejection rate. We may have to sample many points before
we get the 1000 accepted points. In the example we did in class relating the [math]f(x)=2x[/math],
we had to sample around 2070 points before we finally accepted 1000 sample points.
-If the form of the proposal distribution, g, is very different from target distribution, f, then c is very large and the algorithm is not computationally efficient.
Acceptance - Rejection Method Application on Normal Distribution
[math]X \sim∼ N(\mu,\sigma^2), \text{ or } X = \sigma Z + \mu, Z \sim~ N(0,1) [/math]
[math]\vert Z \vert[/math] has probability density function of
f(x) = (2/[math]\sqrt{2\pi}[/math]) e^{-x2/2}
g(x) = e^{-x}
Take h(x) = f(x)/g(x) and solve for h'(x) = 0 to find x so that h(x) is maximum.
Hence x=1 maximizes h(x) => c = [math]\sqrt{2e/\pi}[/math]
Thus f(y)/cg(y) = e^{-(y-1)2/2}
learn how to use code to calculate the c between f(x) and g(x).
How to transform [math]U(0,1)[/math] to [math]U(a, b)[/math]
1. Draw U from [math]U(0,1)[/math]
2. Take [math]Y=(b-a)U+a[/math]
3. Now Y follows [math]U(a,b)[/math]
Example: Generate a random variable z from the Semicircular density [math]f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}, -R\leq x\leq R[/math].
-> Proposal distribution: UNIF(-R, R)
-> We know how to generate using [math] U \sim UNIF (0,1) [/math] Let [math] Y= 2RU-R=R(2U-1)[/math], therefore Y follows [math]U(-R,R)[/math]
-> In order to maximize the function we must maximize the top and minimize the bottom.
Now, we need to find c:
Since c=max[f(x)/g(x)], where
[math]f(x)= \frac{2}{\pi R^2} \sqrt{R^2-x^2}[/math], [math]g(x)=\frac{1}{2R}[/math], [math]-R\leq x\leq R[/math]
Thus, we have to maximize R^2-x^2.
=> When x=0, it will be maximized.
Therefore, c=4/pi. * Note: This also means that the probability of accepting a point is [math]\pi/4[/math].
We will accept the points with limit f(x)/[cg(x)]. Since [math]\frac{f(y)}{cg(y)}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-y^{2}}}{\frac{4}{\pi} \frac{1}{2R}}=\frac{\frac{2}{\pi R^{2}} \sqrt{R^{2}-R^{2}(2U-1)^{2}}}{\frac{2}{\pi R}}[/math]
- Note: Y= R(2U-1)
We can also get Y= R(2U-1) by using the formula y = a+(b-a)*u, to transform U~(0,1) to U~(a,b). Letting a=-R and b=R, and substituting it in the formula y = a+(b-a)*u, we get Y= R(2U-1).
Thus, [math]\frac{f(y)}{cg(y)}=\sqrt{1-(2U-1)^{2}}[/math] * this also means the probability we can accept points
The algorithm to generate random variable x is then:
1. Draw [math]\ U[/math] from [math]\ U(0,1)[/math]
2. Draw [math]\ U_{1}[/math] from [math]\ U(0,1)[/math]
3. If [math]U_{1} \leq \sqrt{1-(2U-1)^2}, set x = U_{1}[/math]
else return to step 1.
The condition is
[math] U_{1} \leq \sqrt{(1-(2U-1)^2)}[/math]
[math]\ U_{1}^2 \leq 1 - (2U -1)^2[/math]
[math]\ U_{1}^2 - 1 \leq -(2U - 1)^2[/math]
[math]\ 1 - U_{1}^2 \geq (2U - 1)^2[/math]
One more example about AR method
(In this example, we will see how to determine the value of c when c is a function with unknown parameters instead of a value)
Let [math]f(x)=x×e^{-x}, x \gt 0 [/math]
Use [math]g(x)=a×e^{-a×x}[/math] to generate random variable
Solution: First of all, we need to find c
[math]cg(x)\gt =f(x)[/math]
[math]c\gt =\frac{f(x)}{g(x)}[/math]
[math]\frac{f(x)}{g(x)}=\frac{x}{a} * e^{-(1-a)x}[/math]
take derivative with respect to x, and set it to 0 to get the maximum,
[math]\frac{1}{a} * e^{-(1-a)x} - \frac{x}{a} * e^{-(1-a)x} * (1-a) = 0 [/math]
[math]x=\frac {1}{1-a}[/math]
[math]\frac {f(x)}{g(x)} = \frac {e^{-1}}{a*(1-a)} [/math]
[math]\frac {f(0)}{g(0)} = 0[/math]
[math]\frac {f(\infty)}{g(\infty)} = 0[/math]
therefore, [math]c= \frac {e^{-1}}{a*(1-a)}[/math]
In order to minimize c, we need to find the appropriate a
Take derivative with respect to a and set it to be zero,
We could get [math]a= \frac {1}{2}[/math]
[math]c=\frac{4}{e}[/math]
Procedure:
1. Generate u v ~unif(0,1)
2. Generate y from g, since g is exponential with rate 2, let y=-0.5*ln(u)
3. If [math]v\lt \frac{f(y)}{c\cdot g(y)}[/math], output y
Else, go to 1
Acknowledgements: The example above is from Stat 340 Winter 2013 notes.
Summary of how to find the value of c
Let [math]h(x) = \frac {f(x)}{g(x)}[/math], and then we have the following:
1. First, take derivative of h(x) with respect to x, get x_{1};
2. Plug x_{1} into h(x) and get the value(or a function) of c, denote as c_{1};
3. Check the endpoints of x and sub the endpoints into h(x);
4. (if c_{1} is a value, then we can ignore this step) Since we want the smallest value of c such that [math]f(x) \leq c\cdot g(x)[/math] for all x, we want the unknown parameter that minimizes c.
So we take derivative of c_{1} with respect to the unknown parameter (ie k=unknown parameter) to get the value of k.
Then we submit k to get the value of c_{1}. (Double check that [math]c_1 \geq 1[/math]
5. Pick the maximum value of h(x) to be the value of c.
For the two examples above, we need to generate the probability function to uniform distribution, and figure out [math]c=max\frac {f(y)}{g(y)} [/math]. If [math]v\lt \frac {f(y)}{c\cdot g(y)}[/math], output y.
Summary of when to use the Accept Rejection Method
1) When the calculation of inverse cdf cannot to be computed or is too difficult to compute.
2) When f(x) can be evaluated to at least one of the normalizing constant.
3) A constant c where [math]f(x)\leq c\cdot g(x)[/math]
4) A uniform draw
Interpretation of 'C'
We can use the value of c to calculate the acceptance rate by [math]\tfrac{1}{c}[/math].
For instance, assume c=1.5, then we can tell that 66.7% of the points will be accepted ([math]\tfrac{1}{1.5} = 0.667[/math]). We can also call the efficiency of the method is 66.7%.
Likewise, if the minimum value of possible values for C is [math]\tfrac{4}{3}[/math], [math]1/ \tfrac{4}{3}[/math] of the generated random variables will be accepted. Thus the efficient of the algorithm is 75%.
In order to ensure the algorithm is as efficient as possible, the 'C' value should be as close to one as possible, such that [math]\tfrac{1}{c}[/math] approaches 1 => 100% acceptance rate.
>> close All
>> clear All
>> i=1
>> j=0;
>> while ii<1000
y=rand
u=rand
if u<=y;
x(ii)=y
ii=ii+1
end
end
Class 5 - Tuesday, May 21
Recall the example in the last lecture. The following code will generate a random variable required by the question.
- Code
>>close all >>clear all >>ii=1; >>R=1; #Note: that R is a constant in which we can change i.e. if we changed R=4 then we would have a density between -4 and 4 >>while ii<1000 u1 = rand; u2 = rand; y = R*(2*u2-1); if (1-u1^2)>=(2*u2-1)^2 x(ii) = y; ii = ii + 1; #Note: for beginner programmers that this step increases the ii value for next time through the while loop end end >>hist(x,20) # 20 is the number of bars >>hist(x,30) #30 is the number of bars
calculate process:
[math]u_{1} \lt = \sqrt (1-(2u-1)^2) [/math]
[math](u_{1})^2 \lt =(1-(2u-1)^2) [/math]
[math](u_{1})^2 -1 \lt =(-(2u-1)^2) [/math]
[math]1-(u_{1})^2 \gt =((2u-1)^2-1) [/math]
MATLAB tips: hist(x,y) plots a histogram of variable x, where y is the number of bars in the graph.
Discrete Examples
- Example 1
Generate random variable [math]X[/math] according to p.m.f
[math]\begin{align}
P(x &=1) &&=0.15 \\
P(x &=2) &&=0.25 \\
P(x &=3) &&=0.3 \\
P(x &=4) &&=0.1 \\
P(x &=5) &&=0.2 \\
\end{align}[/math]
The discrete case is analogous to the continuous case. Suppose we want to generate an X that is a discrete random variable with pmf f(x)=P(X=x). Suppose also that we use the discrete uniform distribution as our target distribution, then [math] g(x)= P(X=x) =0.2 [/math] for all X.
The following algorithm then yields our X:
Step 1 Draw discrete uniform distribution of 1, 2, 3, 4 and 5, [math]Y \sim~ g[/math].
Step 2 Draw [math]U \sim~ U(0,1)[/math].
Step 3 If [math]U \leq \frac{f(Y)}{c \cdot g(Y)}[/math], then X = Y ;
Else return to Step 1.
C can be found by maximizing the ratio :[math] \frac{f(x)}{g(x)} [/math]. To do this, we want to maximize [math] f(x) [/math] and minimize [math] g(x) [/math].
- [math]c = max \frac{f(x)}{g(x)} = \frac {0.3}{0.2} = 1.5 [/math]
Note: In this case [math]f(x)=P(X=x)=0.3[/math] (highest probability from the discrete probabilities in the question)
- [math]\frac{p(x)}{cg(x)} = \frac{p(x)}{1.5*0.2} = \frac{p(x)}{0.3} [/math]
Note: The U is independent from y in Step 2 and 3 above. ~The constant c is a indicator of rejection rate or efficiency of the algorithm. It can represent the average number of trials of the algorithm. Thus, a higher c would mean that the algorithm is comparatively inefficient.
the acceptance-rejection method of pmf, the uniform probability is the same for all variables, and there are 5 parameters(1,2,3,4,5), so g(x) is 0.2
Remember that we always want to choose [math] cg [/math] to be equal to or greater than [math] f [/math], but as close as possible.
limitations: If the form of the proposal dist g is very different from target dist f, then c is very large and the algorithm is not computatively efficient.
- Code for example 1
>>close all >>clear all >>p=[.15 .25 .3 .1 .2]; %This a vector holding the values >>ii=1; >>while ii < 1000 y=unidrnd(5); %generates random numbers for the discrete uniform u=rand; distribution with maximum 5. if u<= p(y)/0.3 x(ii)=y; ii=ii+1; end end >>hist(x)
unidrnd(k) draws from the discrete uniform distribution of integers [math]1,2,3,...,k[/math] If this function is not built in to your MATLAB then we can do simple transformation on the rand(k) function to make it work like the unidrnd(k) function.
The acceptance rate is [math]\frac {1}{c}[/math], so the lower the c, the more efficient the algorithm. Theoretically, c equals 1 is the best case because all samples would be accepted; however it would only be true when the proposal and target distributions are exactly the same, which would never happen in practice.
For example, if c = 1.5, the acceptance rate would be [math]\frac {1}{1.5}=\frac {2}{3}[/math]. Thus, in order to generate 1000 random values, on average, a total of 1500 iterations would be required.
A histogram to show 1000 random values of f(x), more random value make the probability close to the express probability value.
- Example 2
p(x=1)=0.1
p(x=2)=0.3
p(x=3)=0.6
Let g be the uniform distribution of 1, 2, or 3
g(x)= 1/3
[math]c=max(\tfrac{p_{x}}{g(x)})=0.6/(\tfrac{1}{3})=1.8[/math]
Hence [math]\tfrac{p(x)}{cg(x)} = p(x)/(1.8 (\tfrac{1}{3}))= \tfrac{p(x)}{0.6}[/math]
1,y~g
2,u~U(0,1)
3, If [math]U \leq \frac{f(y)}{cg(y)}[/math], set x = y. Else go to 1.
- Code for example 2
>>close all >>clear all >>p=[.1 .3 .6]; %This a vector holding the values >>ii=1; >>while ii < 1000 y=unidrnd(3); %generates random numbers for the discrete uniform distribution with maximum 3 u=rand; if u<= p(y)/0.6 x(ii)=y; ii=ii+1; %else ii=ii+1 end end >>hist(x)
- Example 3
Suppose [math]\begin{align}p_{x} = e^{-3}3^{x}/x! , x\geq 0\end{align}[/math] (Poisson distribution)
First: Try the first few [math]\begin{align}p_{x}'s\end{align}[/math]: 0.0498, 0.149, 0.224, 0.224, 0.168, 0.101, 0.0504, 0.0216, 0.0081, 0.0027 for [math]\begin{align} x = 0,1,2,3,4,5,6,7,8,9 \end{align}[/math]
Proposed distribution: Use the geometric distribution for [math]\begin{align}g(x)\end{align}[/math];
[math]\begin{align}g(x)=p(1-p)^{x}\end{align}[/math], choose [math]\begin{align}p=0.25\end{align}[/math]
Look at [math]\begin{align}p_{x}/g(x)\end{align}[/math] for the first few numbers: 0.199 0.797 1.59 2.12 2.12 1.70 1.13 0.647 0.324 0.144 for [math]\begin{align} x = 0,1,2,3,4,5,6,7,8,9 \end{align}[/math]
We want [math]\begin{align}c=max(p_{x}/g(x))\end{align}[/math] which is approximately 2.12
The general procedures to generate [math]\begin{align}p(x)\end{align}[/math] is as follows:
1. Generate [math]\begin{align}U_{1} \sim~ U(0,1); U_{2} \sim~ U(0,1)\end{align}[/math]
2. [math]\begin{align}j = \lfloor \frac{ln(U_{1})}{ln(.75)} \rfloor+1;\end{align}[/math]
3. if [math]U_{2} \lt \frac{p_{j}}{cg(j)}[/math], set [math]\begin{align}X = x_{j}\end{align}[/math], else go to step 1.
Note: In this case, [math]\begin{align}f(x)/g(x)\end{align}[/math] is extremely difficult to differentiate so we were required to test points. If the function is very easy to differentiate, we can calculate the max as if it were a continuous function then check the two surrounding points for which is the highest discrete value.
- Example 4 (Hypergeometric & Binomial)
Suppose we are given f(x) such that it is hypergeometically distributed, given 10 white balls, 5 red balls, and select 3 balls, let X be the number of red ball selected, without replacement.
Choose g(x) such that it is binomial distribution, Bin(3, 1/3). Find the rejection constant, c
Solution:
For hypergeometric: [math]P(X=0) =\binom{10}{3}/\binom{15}{3} =0.2637, P(x=1)=\binom{10}{2} * \binom{5}{1} /\binom{15}{3}=0.4945, P(X=2)=\binom{10}{1} * \binom{5}{2} /\binom{15}{3}=0.2198,[/math]
[math]P(X=3)=\binom{5}{3}/\binom{15}{3}= 0.02198[/math]
For Binomial g(x): P(X=0) = (2/3)^3=0.2963; P(X=1)= 3*(1/3)*(2/3)^2 = 0.4444, P(X=2)=3*(1/3)^2*(2/3)=0.2222, P(X=3)=(1/3)^3=0.03704
Find the value of f/g for each X
X=0: 0.8898; X=1: 1.1127; X=2: 0.9891; X=3: 0.5934
Choose the maximum which is c=1.1127
Looking for the max f(x) is 0.4945 and the max g(x) is 0.4444, so we can calculate the max c is 1.1127. But for the graph, this c is not the best because it does not cover all the point of f(x), so we need to move the c*g(x) graph to cover all f(x), and decreasing the rejection ratio.
Limitation: If the shape of the proposed distribution g is very different from the target distribution f, then the rejection rate will be high (High c value). Computationally, the algorithm is always right; however it is inefficient and requires many iterations.
Here is an example:
In the above example, we need to move c*g(x) to the peak of f to cover the whole f. Thus c will be very large and 1/c will be small.
The higher the rejection rate, more points will be rejected.
More on rejection/acceptance rate: 1/c is the acceptance rate. As c decreases (note: the minimum value of c is 1), the acceptance rate increases. In our last example, 1/c=1/1.5≈66.67%. Around 67% of points generated will be accepted.
which brings the acceptance rate low which leads to very time consuming sampling
Acceptance-Rejection Method
Problem: The CDF is not invertible or it is difficult to find the inverse.
Plan:
- Draw y~g(.)
- Draw u~Unif(0,1)
- If [math]u\leq \frac{f(y)}{cg(y)}[/math]then set x=y. Else return to Step 1
x will have the desired distribution.
Matlab Example
close all clear all ii=1; R=1; while ii<1000 u1 = rand; u2 = rand; y = R*(2*u2-1); if (1-u1^2)>=(2*u2-1)^2 x(ii) = y; ii = ii + 1; end end hist(x,20)
Recall that,
Suppose we have an efficient method for simulating a random variable having probability mass function {q(j),j>=0}. We can use this as the basis for simulating from the distribution having mass function {p(j),j>=0} by first simulating a random variable Y having mass function {q(j)} and then accepting this simulated value with a probability proportional to p(Y)/q(Y).
Specifically, let c be a constant such that p(j)/q(j)<=c for all j such that p(j)>0
We now have the following technique, called the acceptance-rejection method, for simulating a random variable X having mass function p(j)=P{X=j}.
Sampling from commonly used distributions
Please note that this is not a general technique as is that of acceptance-rejection sampling. Later, we will generalize the distributions for multidimensional purposes.
- Gamma
The CDF of the Gamma distribution [math]Gamma(t,\lambda)[/math] is(t denotes the shape, [math]\lambda[/math] denotes the scale:
[math] F(x) = \int_0^{x} \frac{e^{-y}y^{t-1}}{(t-1)!} \mathrm{d}y, \; \forall x \in (0,+\infty)[/math], where [math]t \in \N^+ \text{ and } \lambda \in (0,+\infty)[/math].
Note that the CDF of the Gamma distribution does not have a closed form.
The gamma distribution is often used to model waiting times between a certain number of events. It can also be expressed as the sum of infinitely many independent and identically distributed exponential distributions. This distribution has two parameters: the number of exponential terms n, and the rate parameter [math]\lambda[/math]. In this distribution there is the Gamma function, [math]\Gamma [/math] which has some very useful properties. "Source: STAT 340 Spring 2010 Course Notes"
Neither Inverse Transformation nor Acceptance-Rejection Method can be easily applied to Gamma distribution. However, we can use additive property of Gamma distribution to generate random variables.
- Additive Property
If [math]X_1, \dots, X_t[/math] are independent exponential distributions with hazard rate [math] \lambda [/math] (in other words, [math] X_i\sim~ Exp (\lambda) [/math][math], Exp (\lambda)= Gamma (1, \lambda)), then \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) [/math]
Side notes: if [math] X_i\sim~ Gamma(a,\lambda)[/math] and [math] Y_i\sim~ Gamma(B,\lambda)[/math] are independent gamma distributions, then [math]\frac{X}{X+Y}[/math] has a distribution of [math] Beta(a,B). [/math]
If we want to sample from the Gamma distribution, we can consider sampling from [math]t[/math] independent exponential distributions using the Inverse Method for each [math] X_i[/math] and add them up. Note that this only works the specific set of gamma distributions where t is a positive integer.
According to this property, a random variable that follows Gamma distribution is the sum of i.i.d (independent and identically distributed) exponential random variables. Now we want to generate 1000 values of [math]Gamma(20,10)[/math] random variables, so we need to obtain the value of each one by adding 20 values of [math]X_i \sim~ Exp(10)[/math]. To achieve this, we generate a 20-by-1000 matrix whose entries follow [math]Exp(10)[/math] and add the rows together.
[math] x_1 \sim~Exp(\lambda)[/math]
[math]x_2 \sim~Exp(\lambda)[/math]
...
[math]x_t \sim~Exp(\lambda)[/math]
[math]x_1+x_2+...+x_t~[/math]
>>l=1 >>u-rand(1,1000); >>x=-(1/l)*log(u); >>hist(x) >>rand
- Procedure
- Sample independently from a uniform distribution [math]t[/math] times, giving [math] U_1,\dots,U_t \sim~ U(0,1)[/math]
- Use the Inverse Transform Method, [math] X_i = -\frac {1}{\lambda}\log(1-U_i)[/math], giving [math] X_1,\dots,X_t \sim~Exp(\lambda)[/math]
- Use the additive property,[math] X = \Sigma_{i=1}^t X_i \sim~ Gamma (t, \lambda) [/math]
- Note for Procedure
- If [math]U\sim~U(0,1)[/math], then [math]U[/math] and [math]1-U[/math] will have the same distribution (both follows [math]U(0,1)[/math])
- This is because the range for [math]1-U[/math] is still [math](0,1)[/math], and their densities are identical over this range.
- Let [math]Y=1-U[/math], [math]Pr(Y\lt =y)=Pr(1-U\lt =y)=Pr(U\gt =1-y)=1-Pr(U\lt =1-y)=1-(1-y)=y[/math], thus [math]1-U\sim~U(0,1)[/math]
- Code
>>close all >>clear all >>lambda = 1; >>u = rand(20, 1000); Note: this command generate a 20x1000 matrix (which means we generate 1000 number for each X_i with t=20); all the elements are generated by rand >>x = (-1/lambda)*log(1-u); Note: log(1-u) is essentially the same as log(u) only if u~U(0,1) >>xx = sum(x) Note: sum(x) will sum all elements in the same column. size(xx) can help you to verify >>size(sum(x)) Note: see the size of x if we forget it (the answer is 20 1000) >>hist(x(1:)) Note: the graph of the first exponential distribution >>hist(xx)
size(x) and size(u) are both 20*1000 matrix. Since if u~unif(0, 1), u and 1 - u have the same distribution, we can substitute 1-u with u to simply the equation. Alternatively, the following command will do the same thing with the previous commands.
- Code
>>close all >>clear all >>lambda = 1; >>xx = sum((-1/lambda)*log(rand(20, 1000))); ''This is simple way to put the code in one line. Here we can use either log(u) or log(1-u) since U~U(0,1); >>hist(xx)
In the matrix rand(20,1000) means 20 row with 1000 numbers for each. use the code to show the generalize the distributions for multidimensional purposes in different cases, su