a Deeper Look into Importance Sampling: Difference between revisions
Line 21: | Line 21: | ||
We can see that if <math>\displaystyle g(x) \rightarrow 0 </math>, then <math>\displaystyle E[(w(x))^2] \rightarrow \infty </math>. This occurs if <math>\displaystyle g </math> has a thinner tail than <math>\displaystyle f </math> then <math>\frac{h^2(x)f^2(x)}{g(x)} </math> could be infinitely large. | We can see that if <math>\displaystyle g(x) \rightarrow 0 </math>, then <math>\displaystyle E[(w(x))^2] \rightarrow \infty </math>. This occurs if <math>\displaystyle g </math> has a thinner tail than <math>\displaystyle f </math> then <math>\frac{h^2(x)f^2(x)}{g(x)} </math> could be infinitely large. | ||
=====Remark 1===== | |||
It is evident that <math>\displaystyle g(x) </math> should be chosen such that it has a thicker tail than <math>\displaystyle f(x) </math>. | It is evident that <math>\displaystyle g(x) </math> should be chosen such that it has a thicker tail than <math>\displaystyle f(x) </math>. | ||
If <math>\displaystyle f</math> is large over set <math>\displaystyle A</math> but </math>\displaystyle g</math> is small, then <math>\displaystyle \frac{f}{g} </math> would be large and it would result in a large variance. | If <math>\displaystyle f</math> is large over set <math>\displaystyle A</math> but </math>\displaystyle g</math> is small, then <math>\displaystyle \frac{f}{g} </math> would be large and it would result in a large variance. | ||
=====Remark 2===== | |||
It is useful if we can choose <math>\displaystyle g </math> to be similar to <math>\displaystyle f</math> in terms of shape. Analytically, we can show that there is a <math>\displaystyle g</math> that would result in a variance that is minimized. | It is useful if we can choose <math>\displaystyle g </math> to be similar to <math>\displaystyle f</math> in terms of shape. Analytically, we can show that there is a <math>\displaystyle g</math> that would result in a variance that is minimized. | ||
Revision as of 22:53, 3 June 2009
A Deeper Look into Importance Sampling - June 2, 2009
From last class, we have determined that an integral can be written in the form [math]\displaystyle{ I = \displaystyle\int h(x)f(x)\,dx }[/math] [math]\displaystyle{ = \displaystyle\int \frac{h(x)f(x)}{g(x)}g(x)\,dx }[/math] We continue our discussion of Importance Sampling here.
Importance Sampling
We can see that the integral [math]\displaystyle{ \displaystyle\int \frac{h(x)f(x)}{g(x)}g(x)\,dx = \int \frac{f(x)}{g(x)}h(x) g(x)\,dx }[/math] is just [math]\displaystyle{ \displaystyle E_g(h(x)) \rightarrow }[/math]the expectation of h(x) with respect to g(x), where [math]\displaystyle{ \displaystyle \frac{f(x)}{g(x)} }[/math] is a weight [math]\displaystyle{ \displaystyle\beta(x) }[/math]. In the case where [math]\displaystyle{ \displaystyle f \gt g }[/math], a greater weight for [math]\displaystyle{ \displaystyle\beta(x) }[/math] will be assigned. Thus, the points with more weight are deemed more important, hence "importance sampling". This can be seen as a variance reduction technique.
Problem
The method of Importance Sampling is simple but can lead to some problems. The [math]\displaystyle{ \displaystyle \hat I }[/math] estimated by Importance Sampling could have infinite standard error.
Given [math]\displaystyle{ \displaystyle I= \int w(x) g(x) dx }[/math] [math]\displaystyle{ = \displaystyle E_g(w(x)) }[/math] [math]\displaystyle{ = \displaystyle \frac{1}{N}\sum_{i=1}^{N} w(x_i) }[/math] where [math]\displaystyle{ \displaystyle w(x)=\frac{f(x)h(x)}{g(x)} }[/math].
Obtaining the second moment, [math]\displaystyle{ \displaystyle E[(w(x))^2] = \int (\frac{h(x)f(x)}{g(x)})^2 g(x) dx }[/math] [math]\displaystyle{ \displaystyle = \int \frac{h^2(x) f^2(x)}{g^2(x)} g(x) dx }[/math] [math]\displaystyle{ \displaystyle = \int \frac{h^2(x)f^2(x)}{g(x)} dx }[/math]
We can see that if [math]\displaystyle{ \displaystyle g(x) \rightarrow 0 }[/math], then [math]\displaystyle{ \displaystyle E[(w(x))^2] \rightarrow \infty }[/math]. This occurs if [math]\displaystyle{ \displaystyle g }[/math] has a thinner tail than [math]\displaystyle{ \displaystyle f }[/math] then [math]\displaystyle{ \frac{h^2(x)f^2(x)}{g(x)} }[/math] could be infinitely large.
Remark 1
It is evident that [math]\displaystyle{ \displaystyle g(x) }[/math] should be chosen such that it has a thicker tail than [math]\displaystyle{ \displaystyle f(x) }[/math]. If [math]\displaystyle{ \displaystyle f }[/math] is large over set [math]\displaystyle{ \displaystyle A }[/math] but </math>\displaystyle g</math> is small, then [math]\displaystyle{ \displaystyle \frac{f}{g} }[/math] would be large and it would result in a large variance.
Remark 2
It is useful if we can choose [math]\displaystyle{ \displaystyle g }[/math] to be similar to [math]\displaystyle{ \displaystyle f }[/math] in terms of shape. Analytically, we can show that there is a [math]\displaystyle{ \displaystyle g }[/math] that would result in a variance that is minimized.
Theorem: The choice of [math]\displaystyle{ \displaystyle g }[/math] that minimizes variance of I\hat is [math]\displaystyle{ \displaystyle g^*(x)=\abs{h(x)} }[/math]