# binomial Probability Monte Carlo Sampling June 2 2009

##### Example 1:

You are given two independent Binomial distributions with probabilities $\displaystyle p_1\text{, }p_2$. Using a Monte Carlo simulation, approximate the value of $\displaystyle \delta$, where $\displaystyle \delta = p_1 - p_2$.

$\displaystyle X \sim BIN(n, p_1)$; $\displaystyle Y \sim BIN(n, p_2)$; $\displaystyle \delta = p_1 - p_2$

So $\displaystyle f(p_1, p_2 | x,y) = \frac{f(x, y|p_1, p_2)*f(p_1,p_2)}{f(x,y)}$ where $\displaystyle f(x,y)$ is a flat distribution and the expected value of $\displaystyle \delta$ is as follows:

$\displaystyle \hat{\delta} = \int\int\delta f(p_1,p_2|X,Y)\,dp_1dp_2$

Since X, Y are independent, we can split the conditional probability distribution:

$\displaystyle f(p_1,p_2|X,Y) \propto f(p_1|X)f(p_2|Y)$

We need to find conditional distribution functions for $\displaystyle p_1, p_2$ to draw samples from. In order to get a distribution for the probability 'p' of a Binomial, we have to divide the Binomial distribution by n. This new distribution has the same shape as the original, but is scaled. A Beta distribution is a suitable approximation. Let

$\displaystyle f(p_1 | X) \sim \text{Beta}(x+1, n-x+1)$ and $\displaystyle f(p_2 | Y) \sim \text{Beta}(y+1, n-y+1)$, where
$\displaystyle \text{Beta}(\alpha,\beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}p^{\alpha-1}(1-p)^{\beta-1}$

Process:

1. Draw samples for $\displaystyle p_1$ and $\displaystyle p_2$: $\displaystyle (p_1,p_2)^{(1)}$, $\displaystyle (p_1,p_2)^{(2)}$, ..., $\displaystyle (p_1,p_2)^{(n)}$;
2. Compute $\displaystyle \delta = p_1 - p_2$ in order to get n values for $\displaystyle \delta$;
3. $\displaystyle \hat{\delta}=\frac{\displaystyle\sum_{\forall i}\delta^{(i)}}{N}$.

Matlab Code:

The Matlab code for recreating the above example is as follows:
n=100; %number of trials for X
m=100; %number of trials for Y
x=80; %number of successes for X trials
y=60; %number of successes for y trials
p1=betarnd(x+1, n-x+1, 1, 1000);
p2=betarnd(y+1, m-y+1, 1, 1000);
delta=p1-p2;
mean(delta);


The mean in this example is given by 0.1938.

A 95% confidence interval for $\delta$ is represented by the interval between the 2.5% and 97.5% quantiles which covers 95% of the probability distribution. In Matlab, this can be calculated as follows:

q1=quantile(delta,0.025);
q2=quantile(delta,0.975);


The interval is approximately $95% CI \approx (0.06606, 0.32204)$

The histogram of delta is:

Note: In this case, we can also find $E(\delta)$ analytically since $E(\delta) = E(p_1 - p_2) = E(p_1) - E(p_2) = \frac{x+1}{n+2} - \frac{y+1}{m+2} \approx 0.1961$. Compare this with the maximum likelihood estimate for $\delta$: $\frac{x}{n} - \frac{y}{m} = 0.2$.

##### Example 2:

We conduct an experiment by giving rats one of ten possible doses of a drug, where each subsequent dose is more lethal than the previous one:

$\displaystyle x_1\lt x_2\lt ...\lt x_{10}$

For each dose $\displaystyle x_i$ we test n rats and observe $\displaystyle Y_i$, the number of rats that survive. Therefore,

$\displaystyle Y_i \sim~ BIN(n, p_i)$
.

We can assume that the probability of death grows with the concentration of drug given, i.e. $\displaystyle p_1\lt p_2\lt ...\lt p_{10}$. Estimate the dose at which the animals have at least 50% chance of dying.

Let $\displaystyle \delta=x_j$ where $\displaystyle j=min\{i|p_i\geq0.5\}$
We are interested in $\displaystyle \delta$ since any higher concentrations are known to have a higher death rate.

Solving this analytically is difficult:

$\displaystyle \delta = g(p_1, p_2, ..., p_{10})$ where g is an unknown function
$\displaystyle \hat{\delta} = \int \int..\int_A \delta f(p_1,p_2,...,p_{10}|Y_1,Y_2,...,Y_{10})\,dp_1dp_2...dp_{10}$
where $\displaystyle A=\{(p_1,p_2,...,p_{10})|p_1\leq p_2\leq ...\leq p_{10} \}$

Process: Monte Carlo
We assume that

1. Draw $\displaystyle p_i \sim~ BETA(y_i+1, n-y_i+1)$
2. Keep sample only if it satisfies $\displaystyle p_1\leq p_2\leq ...\leq p_{10}$, otherwise discard and try again.
3. Compute $\displaystyle \delta$ by finding the first $\displaystyle p_i$ sample with over 50% deaths.
4. Repeat process n times to get n estimates for $\displaystyle \delta_1, \delta_2, ..., \delta_N$.
5. $\displaystyle \bar{\delta} = \frac{\displaystyle\sum_{\forall i} \delta_i}{N}$.