importance Sampling and Markov Chain Monte Carlo (MCMC): Difference between revisions
No edit summary |
m (Conversion script moved page Importance Sampling and Markov Chain Monte Carlo (MCMC) to importance Sampling and Markov Chain Monte Carlo (MCMC): Converting page titles to lowercase) |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Remark 4: | =Importance Sampling and Markov Chain Monte Carlo - June 4, 2009 = | ||
====Remark 4:==== | |||
<math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> | <math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> | ||
:: <math>= \displaystyle\int \ h(x)\frac{f(x)}{g(x)}g(x)\,dx</math> | :: <math>= \displaystyle\int \ h(x)\frac{f(x)}{g(x)}g(x)\,dx</math> | ||
Line 21: | Line 24: | ||
\end{cases}</math> | \end{cases}</math> | ||
=====Approach I: Monte Carlo===== | |||
:<math>\hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)</math> where <math>X \sim~ N(0,1) </math> | :<math>\hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)</math> where <math>X \sim~ N(0,1) </math> | ||
The idea here is to sample from normal distribution and to count number of observations that is greater than 3. | The idea here is to sample from normal distribution and to count number of observations that is greater than 3. | ||
Line 38: | Line 41: | ||
On running this, we get <math> meanMC = 0.0005 </math> and <math> varMC \approx 1.31313 * 10^{-5} </math> | On running this, we get <math> meanMC = 0.0005 </math> and <math> varMC \approx 1.31313 * 10^{-5} </math> | ||
=====Approach II: Importance Sampling===== | |||
:<math>\hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)\frac{f(x_i)}{g(x_i)}</math> where <math>f(x)</math> is standard normal and <math>g(x)</math> needs to be chosen wisely so that it is similar to the target distribution. | :<math>\hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)\frac{f(x_i)}{g(x_i)}</math> where <math>f(x)</math> is standard normal and <math>g(x)</math> needs to be chosen wisely so that it is similar to the target distribution. | ||
Line 63: | Line 66: | ||
==== Markov Chain Monte Carlo (MCMC) ==== | ==== Markov Chain Monte Carlo (MCMC) ==== | ||
Before we tackle Markov chain Monte Carlo methods, which essentially are a 'class of algorithms for sampling from probability distributions based on constructing a Markov chain' [MCMC, Wikipedia], we will first give a formal definition of Markov Chain. | |||
Consider the same integral: | Consider the same integral: | ||
<math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> | <math> I = \displaystyle\int^\ h(x)f(x)\,dx </math> | ||
Line 117: | Line 122: | ||
0&0&\dots&0&1 | 0&0&\dots&0&1 | ||
\end{matrix}\right)</math> | \end{matrix}\right)</math> | ||
=[[Markov Chain Definitions]] - June 9, 2009 = |
Latest revision as of 09:45, 30 August 2017
Importance Sampling and Markov Chain Monte Carlo - June 4, 2009
Remark 4:
[math]\displaystyle{ I = \displaystyle\int^\ h(x)f(x)\,dx }[/math]
- [math]\displaystyle{ = \displaystyle\int \ h(x)\frac{f(x)}{g(x)}g(x)\,dx }[/math]
- [math]\displaystyle{ = \displaystyle\sum_{i=1}^{N} h(x_i)b(x_i) }[/math] where [math]\displaystyle{ \displaystyle b(x_i) = \frac{f(x_i)}{g(x_i)} }[/math]
- [math]\displaystyle{ =\displaystyle \frac{\int\ h(x)f(x)\,dx}{\int f(x) dx} }[/math]
Apply the idea of importance sampling to both numerator and denominator:
- [math]\displaystyle{ =\displaystyle \frac{\int\ h(x)\frac{f(x)}{g(x)}g(x)\,dx}{\int\frac{f(x)}{g(x)}g(x) dx} }[/math]
- [math]\displaystyle{ = \displaystyle\frac{\sum_{i=1}^{N} h(x_i)b(x_i)}{\sum_{1=1}^{N} b(x_i)} }[/math]
- [math]\displaystyle{ = \displaystyle\sum_{i=1}^{N} h(x_i)b'(x_i) }[/math] where [math]\displaystyle{ \displaystyle b'(x_i) = \frac{b(x_i)}{\sum_{i=1}^{N} b(x_i)} }[/math]
The above results in the following form of Importance Sampling:
- [math]\displaystyle{ \hat{I} = \displaystyle\sum_{i=1}^{N} b'(x_i)h(x_i) }[/math] where [math]\displaystyle{ \displaystyle b'(x_i) = \frac{b(x_i)}{\sum_{i=1}^{N} b(x_i)} }[/math]
This is very important and useful especially when f is known only up to a proportionality constant. Often, this is the case in Bayesian approach when f is a posterior density function.
Example of Importance Sampling
Estimate [math]\displaystyle{ I = \displaystyle\ Pr (Z\gt 3) }[/math] when [math]\displaystyle{ Z \sim~ N(0,1) }[/math]
- [math]\displaystyle{ I = \displaystyle\int^\infty_3 f(x)\,dx \approx 0.0013 }[/math]
- [math]\displaystyle{ = \displaystyle\int^\infty_3 h(x)f(x)\,dx }[/math]
- Define [math]\displaystyle{ h(x) = \begin{cases} 0, & \text{if } x \lt 3 \\ 1, & \text{if } 3 \leq x \end{cases} }[/math]
Approach I: Monte Carlo
- [math]\displaystyle{ \hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i) }[/math] where [math]\displaystyle{ X \sim~ N(0,1) }[/math]
The idea here is to sample from normal distribution and to count number of observations that is greater than 3.
The variability will be high in this case if using Monte Carlo since this is considered a low probability event (a tail event), and different runs may give significantly different values. Foe example: the first run may give only 3 occurences (i.e if we generate 1000 samples, thus the probability will be .003), the second run may give 5 occurences (probability .005), etc.
This example can be illustrated in Matlab using the code below (we will be generating 100 samples in this case):
format long for i = 1:100 a(i) = sum(randn(100,1)>=3)/100; end meanMC = mean(a) varMC = var(a)
On running this, we get [math]\displaystyle{ meanMC = 0.0005 }[/math] and [math]\displaystyle{ varMC \approx 1.31313 * 10^{-5} }[/math]
Approach II: Importance Sampling
- [math]\displaystyle{ \hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)\frac{f(x_i)}{g(x_i)} }[/math] where [math]\displaystyle{ f(x) }[/math] is standard normal and [math]\displaystyle{ g(x) }[/math] needs to be chosen wisely so that it is similar to the target distribution.
- Let [math]\displaystyle{ g(x) \sim~ N(4,1) }[/math]
- [math]\displaystyle{ b(x) = \frac{f(x)}{g(x)} = e^{(8-4x)} }[/math]
- [math]\displaystyle{ \hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nb(x_i)h(x_i) }[/math]
This example can be illustrated in Matlab using the code below:
for j = 1:100 N = 100; x = randn (N,1) + 4; for ii = 1:N h = x(ii)>=3; b = exp(8-4*x(ii)); w(ii) = h*b; end I(j) = sum(w)/N; end MEAN = mean(I) VAR = var(I)
Running the above code gave us [math]\displaystyle{ MEAN \approx 0.001353 }[/math] and [math]\displaystyle{ VAR \approx 9.666 * 10^{-8} }[/math] which is very close to 0, and is much less than the variability observed when using Monte Carlo
Markov Chain Monte Carlo (MCMC)
Before we tackle Markov chain Monte Carlo methods, which essentially are a 'class of algorithms for sampling from probability distributions based on constructing a Markov chain' [MCMC, Wikipedia], we will first give a formal definition of Markov Chain.
Consider the same integral: [math]\displaystyle{ I = \displaystyle\int^\ h(x)f(x)\,dx }[/math]
Idea: If [math]\displaystyle{ X_1, X_2,...X_N }[/math] is a Markov Chain with stationary distribution f(x), then under some conditions
- [math]\displaystyle{ \hat{I} = \frac{1}{N}\displaystyle\sum_{i=1}^Nh(x_i)\xrightarrow{P}\int^\ h(x)f(x)\,dx = I }[/math]
Stochastic Process:
Stochastic Process is a collection of random variables [math]\displaystyle{ \{ X_t : t \in T \} }[/math]
- State Space Set: X is the set that random variables [math]\displaystyle{ X_t }[/math] takes values from.
- Indexed Set: T is the set that t takes values from, which could be discrete or continuous in general, but we are only interested in discrete case in this course.
Example 1
i.i.d random variables
- [math]\displaystyle{ \{ X_t : t \in T \}, X_t \in X }[/math]
- [math]\displaystyle{ X = \{0, 1, 2, 3, 4, 5, 6, 7, 8\} \rightarrow }[/math]State Space
- [math]\displaystyle{ T = \{1, 2, 3, 4, 5\} \rightarrow }[/math]Indexed Set
Example 2
- [math]\displaystyle{ X_t }[/math]: price of a stock
- [math]\displaystyle{ t }[/math]: opening date of the market
Basic Fact: In general, if we have random variables [math]\displaystyle{ X_1,...X_n }[/math]
- [math]\displaystyle{ \displaystyle f(X_1,...X_n)= f(X_1)f(X_2|X_1)f(X_3|X_2,X_1)...f(X_n|X_n-1,...,X_1) }[/math]
- [math]\displaystyle{ \displaystyle f(X_1,...X_n)= \prod_{i = 1}^n f(X_i|Past_i) }[/math] where [math]\displaystyle{ \displaystyle Past_i = (X_i-1, X_i-2,...,X_1) }[/math]
Markov Chain:
A special form of stochastic process in which [math]\displaystyle{ X_t }[/math] depends only on [math]\displaystyle{ X_t-1 }[/math].
For example,
- [math]\displaystyle{ \displaystyle f(X_1,...X_n)= f(X_1)f(X_2|X_1)f(X_3|X_2)...f(X_n|X_n-1) }[/math]
Transition Probability:
The probability of going from one state to another state.
- [math]\displaystyle{ p_{ij} = \Pr(X=X_j\mid X= X_i). \, }[/math]
Transition Matrix:
For n states, transition matrix P is an [math]\displaystyle{ N \times N }[/math] matrix with entries [math]\displaystyle{ P_{ij} }[/math] as below:
- [math]\displaystyle{ P=\left(\begin{matrix}p_{1,1}&p_{1,2}&\dots&p_{1,j}&\dots\\ p_{2,1}&p_{2,2}&\dots&p_{2,j}&\dots\\ \vdots&\vdots&\ddots&\vdots&\ddots\\ p_{i,1}&p_{i,2}&\dots&p_{i,j}&\dots\\ \vdots&\vdots&\ddots&\vdots&\ddots \end{matrix}\right) }[/math]
Transition Matrix Example:
There are n states in a random walk example. The probability of moving to the right state is Pr(heads) = p. And the probability of moving to the left state is Pr(tails) = q = 1-p. Stay at the end states once hit there. The transition matrix is shown as below:
- [math]\displaystyle{ P=\left(\begin{matrix}1&0&\dots&0&\dots\\ p&0&q&0&\dots\\ 0&p&0&q&0\dots\\ \vdots&\vdots&\ddots&\vdots&\ddots\\ p_{i,1}&p_{i,2}&\dots&p_{i,j}&\dots\\ \vdots&\vdots&\ddots&\vdots&\ddots\\ 0&0&\dots&0&1 \end{matrix}\right) }[/math]