f11Stat946ass: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "===problem 1=== Recall the definition of the family of probability distributions associated with a directed graphical model: <center><math>p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i...")
 
No edit summary
Line 2: Line 2:
Recall the definition of the family of probability distributions associated with a directed graphical model:
Recall the definition of the family of probability distributions associated with a directed graphical model:


<center><math>p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i})</math></center>.
<center><math>p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i}).</math></center>


Prove by induction that <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math>.
Prove by induction that <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math>.
Line 17: Line 17:
Now, we need to show that this statement, <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math>, holds for <math>n=1</math> (and <math>n=2</math>, just for the sake of more clarity).
Now, we need to show that this statement, <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math>, holds for <math>n=1</math> (and <math>n=2</math>, just for the sake of more clarity).


<center><math>\begin{matrix}
<center><math>\begin{align}
n=1:~p(x)=p(x_1)=p(x_1|\emptyset)=f_1(x_1,x_{\pi_1}), \mbox{ where } x_{\pi_1}=\emptyset \rightarrow  p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}),\\
n=&1:~p(x)=p(x_1)=p(x_1|\emptyset)=f_1(x_1,x_{\pi_1}), \mbox{ where } x_{\pi_1}=\emptyset \rightarrow  p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}),\\
n=2:~p(x)=p(x_1,x_2)=p(x_1)p(x_2|x_1)=p(x_1|\emptyset)p(x_2|x_1)=f_1(x_1,x_{\pi_2})f_2(x_2,x_{\pi_2})\rightarrow  p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}).
n=&2:~p(x)=p(x_1,x_2)=p(x_1)p(x_2|x_1)=p(x_1|\emptyset)p(x_2|x_1)=f_1(x_1,x_{\pi_2})f_2(x_2,x_{\pi_2})\rightarrow  p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}).
\end{matrix}</math></center>
\end{align}</math></center>


Where in the latter case, without loss of generality, we have assumed that <math>x_1</math> is the only parent of <math>x_2</math>. This assumption that we've taken, not only does not reduce from the generality of the proof (as we have initially assumed that the graph is directed and in a directed graph one of two connected nodes/random variables should be a parent of the other), but it also abides by our earlier assumption, that parents come before children. This leaves no parents for <math>x_1</math>, similar to the former case in the above.
Where in the latter case, without loss of generality, we have assumed that <math>x_1</math> is the only parent of <math>x_2</math>. This assumption that we've taken, not only does not reduce from the generality of the proof (as we have initially assumed that the graph is directed and in a directed graph one of two connected nodes/random variables should be a parent of the other), but it also abides by our earlier assumption, that parents come before children. This leaves no parents for <math>x_1</math>, similar to the former case in the above.
Line 26: Line 26:
Now, assuming that the statement <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math> holds for <math>n=k</math>,
Now, assuming that the statement <math>p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})</math> holds for <math>n=k</math>,


<center><math>\begin{matrix}
<center><math>\begin{align}
n=k: p(x)=p(x_1,x_2,\ldots,x_k)=\prod_{i=1}^{k}f_i(x_i,x_{\pi_i})=\prod_{i=1}^{k}p(x_i|x_{\pi_i}).~\flat
n=k: p(x)=p(x_1,x_2,\ldots,x_k)=\prod_{i=1}^{k}f_i(x_i,x_{\pi_i})=\prod_{i=1}^{k}p(x_i|x_{\pi_i}).~\flat
\end{matrix}</math></center>
\end{align}</math></center>


we need to show that it also holds for <math>n=k+1</math>.
we need to show that it also holds for <math>n=k+1</math>.


<center><math>\begin{matrix}
<center><math>\begin{align}
n=k+1: p(x)&=p(x_1,x_2,\ldots,x_{k+1})=\prod_{i=1}^{k+1}f_i(x_i,x_{\pi_i}) \\
n=k+1: p(x)&=p(x_1,x_2,\ldots,x_{k+1})=\prod_{i=1}^{k+1}f_i(x_i,x_{\pi_i}) \\
&=f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}f_i(x_i,x_{\pi_i}) \\
&=f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}f_i(x_i,x_{\pi_i}) \\
&\stackrel{\flat}{=}f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i}).
&\stackrel{\flat}{=}f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i}).
\end{matrix}</math></center>
\end{align}</math></center>


On the other hand and according to <math>\sharp</math> we have:
On the other hand according to <math>\sharp</math> we have:


<center><math>\begin{matrix}
<math>\begin{align}
p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k})
p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k})
\end{matrix}</math></center>
\end{align}</math>


Thus,
Thus,


<center><math>\begin{matrix}
<center><math>\begin{align}
f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i})=p(x_{k+1}|x_1,\ldots,x_{k})p(x_1,\ldots,x_{k}).
f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i})=p(x_{k+1}|x_1,\ldots,x_{k})p(x_1,\ldots,x_{k}).
\end{matrix}</math></center>
\end{align}</math></center>


Once more using <math>\flat</math>, we have:
Once more using <math>\flat</math>, we have:


<center><math>\begin{matrix}
<center><math>\begin{align}
f_{k+1}(x_{k+1},x_{\pi_{k+1}})=p(x_{k+1}|x_1,\ldots,x_{k}),
f_{k+1}(x_{k+1},x_{\pi_{k+1}})=p(x_{k+1}|x_1,\ldots,x_{k}),
\end{matrix}</math></center>
\end{align}</math></center>


and so:
and so:


<center><math>\begin{matrix}
<center><math>\begin{align}
p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}).~\triangle
p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})~\triangle.
\end{matrix}</math></center>
\end{align}</math></center>
 
===problem 2===
For a given variable <math>X_i</math> in a graphical model, what is the minimal set of nodes that renders <math>X_i</math> conditionally independent of all of the other variables? That is, what is the smallest set <math>C</math>, such that <math>X_i\perp X_{V\setminus \{i\}\cup C}|X_C</math>? (Note that <math>V</math> is the set of all nodes in the graph, so that <math>V\setminus \{i\}\cup C</math> is the set of all nodes in the graph, excluding <math>C</math> and <math>i</math>).
* Do the problem for an undirected graph.
* Do the problem for a directed graph.
 
====solution 1====
'''a.''' As in directed graphs, the relationship between each two nodes, <math>X_i</math> and <math>X_j</math>, can be of one of the child-parent or parent-child forms. Let's assume that we show the set of parents of the node <math>X_i</math> like this: <math>X_{\pi_i}</math>. So, a general node can be thought of as in Fig. 1. In this figure, all the possible relationships for the node <math>X_4</math> and its immediate neighbours and those of the depth one has been shown.
 
[[File:ass1-2a.png|thumb|right|Fig.1 A directed graph!]]
 
From the graph in the figure we have:
<center><math>\begin{align}
p(x) = p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_2)p(X_6|X_4,X_5)p(X_7|X_6)\\
\end{align}</math></center>
 
So, we can derive:
<center><math>\begin{align}
p(X_1,X_2,X_4)&=p(X_1)p(X_2|X_1)p(X_4|X_2)\sum_{X_3}p(X_3|X_2)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\
&=p(X_1)p(X_2|X_1)p(X_4|X_2)\\
\rightarrow p(X_4|X_1,X_2) &= \frac{p(X_1,X_2,X_4)}{p(X_1,X_2)}=\frac{p(X_1)p(X_1,X_2)p(X_4,X_2)}{p(X_1,X_2)p(X_1)p(X_2)}=p(X_4|X_2)~\rightarrow~X_4 \perp X_1 | X_2.
\end{align}</math></center>
 
Also we have:
<center><math>\begin{align}
p(X_2,X_3,X_4)&=p(X_4|X_2)p(X_3|X_2)\sum_{X_1}p(X_1)p(X_2|X_1)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\
&=p(X_4|X_2)p(X_3|X_2)p(X_2)\\
\rightarrow p(X_4,X_3|X_2) &= \frac{p(X_2,X_3,X_4)}{p(X_2)}=\frac{p(X_4|X_2)p(X_3|X_2)p(X_2)}{p(X_2)}=p(X_4|X_2)p(X_3|X_2)~\rightarrow~X_4 \perp X_3 | X_2.
\end{align}</math></center>
 
So, by induction we can say that given the parents of a node, it will be conditionally independent of all of the nodes that come before. It is assumed that parents always come before children.
 
Now, if we check the independence of the nodes, from those there is a path to <math>X_4</math> -in general- but not through parents of <math>X_4</math>, we will have:
 
<center><math>\begin{align}
X_4 \perp X_7 | X_6,\\
X_4 \cancel{\perp} X_5 | X_6.
\end{align}</math></center>
 
And if the children of <math>X_4</math> are not given, we will have:
 
<center><math>\begin{align}
X_4 \cancel{\perp} X_7,\\
X_4 \perp X_5.
\end{align}</math></center>
 
So, as of the set <math>C</math>, we can state:
 
<center><math>
C_{X_i}=\pi_{X_i}\cup\{k,\pi_{X_k}|i\in \pi_{X_k}\}
</math></center>

Revision as of 13:50, 6 October 2011

problem 1

Recall the definition of the family of probability distributions associated with a directed graphical model:

[math]\displaystyle{ p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i}). }[/math]

Prove by induction that [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math].

solution 1

According to the product rule we have:

[math]\displaystyle{ p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}).~\sharp }[/math]

This is the most general case for a directed graph, as we can represent each and every graphical model with a fully connected graph.

Now, we need to show that this statement, [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math], holds for [math]\displaystyle{ n=1 }[/math] (and [math]\displaystyle{ n=2 }[/math], just for the sake of more clarity).

[math]\displaystyle{ \begin{align} n=&1:~p(x)=p(x_1)=p(x_1|\emptyset)=f_1(x_1,x_{\pi_1}), \mbox{ where } x_{\pi_1}=\emptyset \rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}),\\ n=&2:~p(x)=p(x_1,x_2)=p(x_1)p(x_2|x_1)=p(x_1|\emptyset)p(x_2|x_1)=f_1(x_1,x_{\pi_2})f_2(x_2,x_{\pi_2})\rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}). \end{align} }[/math]

Where in the latter case, without loss of generality, we have assumed that [math]\displaystyle{ x_1 }[/math] is the only parent of [math]\displaystyle{ x_2 }[/math]. This assumption that we've taken, not only does not reduce from the generality of the proof (as we have initially assumed that the graph is directed and in a directed graph one of two connected nodes/random variables should be a parent of the other), but it also abides by our earlier assumption, that parents come before children. This leaves no parents for [math]\displaystyle{ x_1 }[/math], similar to the former case in the above.

Now, assuming that the statement [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math] holds for [math]\displaystyle{ n=k }[/math],

[math]\displaystyle{ \begin{align} n=k: p(x)=p(x_1,x_2,\ldots,x_k)=\prod_{i=1}^{k}f_i(x_i,x_{\pi_i})=\prod_{i=1}^{k}p(x_i|x_{\pi_i}).~\flat \end{align} }[/math]

we need to show that it also holds for [math]\displaystyle{ n=k+1 }[/math].

[math]\displaystyle{ \begin{align} n=k+1: p(x)&=p(x_1,x_2,\ldots,x_{k+1})=\prod_{i=1}^{k+1}f_i(x_i,x_{\pi_i}) \\ &=f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}f_i(x_i,x_{\pi_i}) \\ &\stackrel{\flat}{=}f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i}). \end{align} }[/math]

On the other hand according to [math]\displaystyle{ \sharp }[/math] we have:

[math]\displaystyle{ \begin{align} p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}) \end{align} }[/math]

Thus,

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i})=p(x_{k+1}|x_1,\ldots,x_{k})p(x_1,\ldots,x_{k}). \end{align} }[/math]

Once more using [math]\displaystyle{ \flat }[/math], we have:

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})=p(x_{k+1}|x_1,\ldots,x_{k}), \end{align} }[/math]

and so:

[math]\displaystyle{ \begin{align} p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})~\triangle. \end{align} }[/math]

problem 2

For a given variable [math]\displaystyle{ X_i }[/math] in a graphical model, what is the minimal set of nodes that renders [math]\displaystyle{ X_i }[/math] conditionally independent of all of the other variables? That is, what is the smallest set [math]\displaystyle{ C }[/math], such that [math]\displaystyle{ X_i\perp X_{V\setminus \{i\}\cup C}|X_C }[/math]? (Note that [math]\displaystyle{ V }[/math] is the set of all nodes in the graph, so that [math]\displaystyle{ V\setminus \{i\}\cup C }[/math] is the set of all nodes in the graph, excluding [math]\displaystyle{ C }[/math] and [math]\displaystyle{ i }[/math]).

  • Do the problem for an undirected graph.
  • Do the problem for a directed graph.

solution 1

a. As in directed graphs, the relationship between each two nodes, [math]\displaystyle{ X_i }[/math] and [math]\displaystyle{ X_j }[/math], can be of one of the child-parent or parent-child forms. Let's assume that we show the set of parents of the node [math]\displaystyle{ X_i }[/math] like this: [math]\displaystyle{ X_{\pi_i} }[/math]. So, a general node can be thought of as in Fig. 1. In this figure, all the possible relationships for the node [math]\displaystyle{ X_4 }[/math] and its immediate neighbours and those of the depth one has been shown.

File:ass1-2a.png
Fig.1 A directed graph!

From the graph in the figure we have:

[math]\displaystyle{ \begin{align} p(x) = p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_2)p(X_6|X_4,X_5)p(X_7|X_6)\\ \end{align} }[/math]

So, we can derive:

[math]\displaystyle{ \begin{align} p(X_1,X_2,X_4)&=p(X_1)p(X_2|X_1)p(X_4|X_2)\sum_{X_3}p(X_3|X_2)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_1)p(X_2|X_1)p(X_4|X_2)\\ \rightarrow p(X_4|X_1,X_2) &= \frac{p(X_1,X_2,X_4)}{p(X_1,X_2)}=\frac{p(X_1)p(X_1,X_2)p(X_4,X_2)}{p(X_1,X_2)p(X_1)p(X_2)}=p(X_4|X_2)~\rightarrow~X_4 \perp X_1 | X_2. \end{align} }[/math]

Also we have:

[math]\displaystyle{ \begin{align} p(X_2,X_3,X_4)&=p(X_4|X_2)p(X_3|X_2)\sum_{X_1}p(X_1)p(X_2|X_1)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_4|X_2)p(X_3|X_2)p(X_2)\\ \rightarrow p(X_4,X_3|X_2) &= \frac{p(X_2,X_3,X_4)}{p(X_2)}=\frac{p(X_4|X_2)p(X_3|X_2)p(X_2)}{p(X_2)}=p(X_4|X_2)p(X_3|X_2)~\rightarrow~X_4 \perp X_3 | X_2. \end{align} }[/math]

So, by induction we can say that given the parents of a node, it will be conditionally independent of all of the nodes that come before. It is assumed that parents always come before children.

Now, if we check the independence of the nodes, from those there is a path to [math]\displaystyle{ X_4 }[/math] -in general- but not through parents of [math]\displaystyle{ X_4 }[/math], we will have:

[math]\displaystyle{ \begin{align} X_4 \perp X_7 | X_6,\\ X_4 \cancel{\perp} X_5 | X_6. \end{align} }[/math]

And if the children of [math]\displaystyle{ X_4 }[/math] are not given, we will have:

[math]\displaystyle{ \begin{align} X_4 \cancel{\perp} X_7,\\ X_4 \perp X_5. \end{align} }[/math]

So, as of the set [math]\displaystyle{ C }[/math], we can state:

[math]\displaystyle{ C_{X_i}=\pi_{X_i}\cup\{k,\pi_{X_k}|i\in \pi_{X_k}\} }[/math]