f11Stat946ass: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 167: Line 167:
&+(0.4*0.2+0.8*0.8)p(X_2=1,X_1)] \\
&+(0.4*0.2+0.8*0.8)p(X_2=1,X_1)] \\
&= p(X_1)(0.56p(X_2=0,X_1)+0.72p(X_2=1,X_1)), \\
&= p(X_1)(0.56p(X_2=0,X_1)+0.72p(X_2=1,X_1)), \\
\rightarrow p(X_1=0,X_4=1)&=0.5(0.56*0.6+0.72+0.4)=0.728,\\
\rightarrow p(X_1=0,X_4=1)&=0.5(0.56*0.6+0.72*0.4)=0.312,\\
\rightarrow p(X_4=1)&=\sum_{X_1}p(X_1,X_4=1) \\
\rightarrow p(X_4=1)&=\sum_{X_1}p(X_1,X_4=1) \\
&= 0.5(0.56*(0.6+0.2)+0.72*(0.4+0.8))=0.656,\\
&= 0.5(0.56*(0.6+0.2)+0.72*(0.4+0.8))=0.656,\\
\rightarrow p(X_1=0|X_4=1)&=\frac{p(X_1=0,X_4=1)}{p(X_4=1)}\\
\rightarrow p(X_1=0|X_4=1)&=\frac{p(X_1=0,X_4=1)}{p(X_4=1)}\\
&=\frac{0.728}{0.656}= 1.1097561!!!
&=\frac{0.312}{0.656}= 0.552.
\end{align}</math></center>
\end{align}</math></center>



Revision as of 18:37, 10 October 2011

problem 1

Recall the definition of the family of probability distributions associated with a directed graphical model:

[math]\displaystyle{ p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i}). }[/math]

Prove by induction that [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math].

solution 1

According to the product rule we have:

[math]\displaystyle{ p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}).~\sharp }[/math]

This is the most general case for a directed graph, as we can represent each and every graphical model with a fully connected graph.

Now, we need to show that this statement, [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math], holds for [math]\displaystyle{ n=1 }[/math] (and [math]\displaystyle{ n=2 }[/math], just for the sake of more clarity).

[math]\displaystyle{ \begin{align} n=&1:~p(x)=p(x_1)=p(x_1|\emptyset)=f_1(x_1,x_{\pi_1}), \mbox{ where } x_{\pi_1}=\emptyset \rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}),\\ n=&2:~p(x)=p(x_1,x_2)=p(x_1)p(x_2|x_1)=p(x_1|\emptyset)p(x_2|x_1)=f_1(x_1,x_{\pi_2})f_2(x_2,x_{\pi_2})\rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}). \end{align} }[/math]

Where in the latter case, without loss of generality, we have assumed that [math]\displaystyle{ x_1 }[/math] is the only parent of [math]\displaystyle{ x_2 }[/math]. This assumption that we've taken, not only does not reduce from the generality of the proof (as we have initially assumed that the graph is directed and in a directed graph one of two connected nodes/random variables should be a parent of the other), but it also abides by our earlier assumption, that parents come before children. This leaves no parents for [math]\displaystyle{ x_1 }[/math], similar to the former case in the above.

Now, assuming that the statement [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math] holds for [math]\displaystyle{ n=k }[/math],

[math]\displaystyle{ \begin{align} n=k: p(x)=p(x_1,x_2,\ldots,x_k)=\prod_{i=1}^{k}f_i(x_i,x_{\pi_i})=\prod_{i=1}^{k}p(x_i|x_{\pi_i}).~\flat \end{align} }[/math]

we need to show that it also holds for [math]\displaystyle{ n=k+1 }[/math].

[math]\displaystyle{ \begin{align} n=k+1: p(x)&=p(x_1,x_2,\ldots,x_{k+1})=\prod_{i=1}^{k+1}f_i(x_i,x_{\pi_i}) \\ &=f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}f_i(x_i,x_{\pi_i}) \\ &\stackrel{\flat}{=}f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i}). \end{align} }[/math]

On the other hand according to [math]\displaystyle{ \sharp }[/math] we have:

[math]\displaystyle{ \begin{align} p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}) \end{align} }[/math]

Thus,

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i})=p(x_{k+1}|x_1,\ldots,x_{k})p(x_1,\ldots,x_{k}). \end{align} }[/math]

Once more using [math]\displaystyle{ \flat }[/math], we have:

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})=p(x_{k+1}|x_1,\ldots,x_{k}), \end{align} }[/math]

and so:

[math]\displaystyle{ \begin{align} p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})~\triangle. \end{align} }[/math]

problem 2

For a given variable [math]\displaystyle{ X_i }[/math] in a graphical model, what is the minimal set of nodes that renders [math]\displaystyle{ X_i }[/math] conditionally independent of all of the other variables? That is, what is the smallest set [math]\displaystyle{ C }[/math], such that [math]\displaystyle{ X_i\perp X_{V\setminus \{i\}\cup C}|X_C }[/math]? (Note that [math]\displaystyle{ V }[/math] is the set of all nodes in the graph, so that [math]\displaystyle{ V\setminus \{i\}\cup C }[/math] is the set of all nodes in the graph, excluding [math]\displaystyle{ C }[/math] and [math]\displaystyle{ i }[/math]).

  • Do the problem for an undirected graph.
  • Do the problem for a directed graph.

solution 1

a. As in directed graphs, the relationship between each two nodes, [math]\displaystyle{ X_i }[/math] and [math]\displaystyle{ X_j }[/math], can be of one of the child-parent or parent-child forms. Let's assume that we show the set of parents of the node [math]\displaystyle{ X_i }[/math] like this: [math]\displaystyle{ X_{\pi_i} }[/math]. So, a general node can be thought of as in Fig. 1. In this figure, all the possible relationships for the node [math]\displaystyle{ X_4 }[/math] and its immediate neighbours and those of the depth one has been shown.

File:ass1-2a.png
Fig.1 A directed graph!

From the graph in the figure we have:

[math]\displaystyle{ \begin{align} p(x) = p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_2)p(X_6|X_4,X_5)p(X_7|X_6)\\ \end{align} }[/math]

So, we can derive:

[math]\displaystyle{ \begin{align} p(X_1,X_2,X_4)&=p(X_1)p(X_2|X_1)p(X_4|X_2)\sum_{X_3}p(X_3|X_2)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_1)p(X_2|X_1)p(X_4|X_2)\\ \rightarrow p(X_4|X_1,X_2) &= \frac{p(X_1,X_2,X_4)}{p(X_1,X_2)}=\frac{p(X_1)p(X_1,X_2)p(X_4,X_2)}{p(X_1,X_2)p(X_1)p(X_2)}=p(X_4|X_2)~\rightarrow~X_4 \perp X_1 | X_2. \end{align} }[/math]

Also we have:

[math]\displaystyle{ \begin{align} p(X_2,X_3,X_4)&=p(X_4|X_2)p(X_3|X_2)\sum_{X_1}p(X_1)p(X_2|X_1)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_4|X_2)p(X_3|X_2)p(X_2)\\ \rightarrow p(X_4,X_3|X_2) &= \frac{p(X_2,X_3,X_4)}{p(X_2)}=\frac{p(X_4|X_2)p(X_3|X_2)p(X_2)}{p(X_2)}=p(X_4|X_2)p(X_3|X_2)~\rightarrow~X_4 \perp X_3 | X_2. \end{align} }[/math]

So, by induction we can say that given the parents of a node, it will be conditionally independent of all of the nodes that come before. It is assumed that parents always come before children.

Now, if we check the independence of the nodes, from those there is a path to [math]\displaystyle{ X_4 }[/math] -in general- but not through parents of [math]\displaystyle{ X_4 }[/math], we will have:

[math]\displaystyle{ \begin{align} X_4 \perp X_7 | X_6,\\ X_4 \cancel{\perp} X_5 | X_6. \end{align} }[/math]

And if the children of [math]\displaystyle{ X_4 }[/math] are not given, we will have:

[math]\displaystyle{ \begin{align} X_4 \cancel{\perp} X_7,\\ X_4 \perp X_5. \end{align} }[/math]

So, as of the set [math]\displaystyle{ C }[/math], we can state:

[math]\displaystyle{ C_{X_i}=\pi_{X_i}\cup\{k,\pi_{X_k}|i\in \pi_{X_k}\} }[/math]

b. As of an undirected graph, we will have:

[math]\displaystyle{ C_{X_i}=\{j|j\mbox{ is an immediate neighbour of }i\} }[/math]

problem 3

Consider the directed graphical model below.

Fig.2 problem 3 - graphical model
Fig.2 problem 3 - graphical model

(a) Moralize the graph, and show the resulting undirected graphical model.

(b) Invoke UndirectedGraphEliminate on your moral graph, using the elimination ordering (7, 8, 9, 6, 3, 5, 4, 2, 1), and show the resulting reconstituted graph (the graph that includes all the edges added during the elimination process). You do not have to show the intermediate steps of the algorithm.

(c) Report (b), but using the elimination ordering (7, 6, 8, 9, 4, 3, 2, 5, 1).

(d) Which of the two elimination orderings, if used in Eliminate to calculate [math]\displaystyle{ p(x_1|x_7) }[/math], do you think would result in a more time-efficient calculation? Which would result in a more space-efficient calculation? Briefly explain why.

solution 1

(a) Here is the resulting moralized undirected graph (Fig. 3).

Fig.3 moralized undirected graphical model
Fig.3 moralized undirected graphical model

(b) Here is the resulting reconstituted graph, after a (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination. (Fig. 4)

Fig.4 Resulting reconstituted graph after (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination.
Fig.4 Resulting reconstituted graph after (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination.

(c) Here is the resulting reconstituted graph, after a (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination. (Fig. 5)

Fig.5 Resulting reconstituted graph after (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination.
Fig.5 Resulting reconstituted graph after (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination.

(d) It is more computationally efficient to remove those nodes, which have less number of edges, as it takes less number of summations over their neighbours. So, starting with single-edge nodes would be a good idea, which makes the first ordering more computationally efficient. (What does space-efficient means, though?)

problem 5

In this problem we will work through the mechanics of using Eliminate for inference in a directed graphical model. Consider the directed graphical model below over the binary variables [math]\displaystyle{ X_1,X_2,X_3, }[/math] and [math]\displaystyle{ X_4 }[/math].

Fig.6 problem 5
Fig.6 problem 5

Let [math]\displaystyle{ p(X_1=0)=p(X_1=1)=0.5 }[/math]. For the local conditional probabilities, [math]\displaystyle{ p(x_{i+1}|x_i) }[/math], use the following matrix.

We will use eliminate to compute [math]\displaystyle{ p(X_1=0|X_4=1) }[/math].

(a) Write out the summations involved in carrying out the elimination algorithm to find [math]\displaystyle{ p(x_1,x_4) }[/math] and [math]\displaystyle{ p(x_4) }[/math], as in Equations (3.8) and (3.15) in Chapter 3 of the text. Use the elimination ordering (4,3,2,1).

solution 1

[math]\displaystyle{ \begin{align} p(x) &= p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_3). \\ \rightarrow p(X_1,X_4=1)&=p(X_1)\sum_{X_2}p(X_2|X_1)\sum_{X_3}p(X_3|X_2)p(X_4=1|X_3) \\ &= p(X_1)\sum_{X_2}p(X_2|X_1)(0.4p(X_3=0,X_2)+0.8p(X_3=1,X_2)) \\ &= p(X_1)[(0.4*0.6+0.8*0.4)p(X_2=0,X_1)\\ &+(0.4*0.2+0.8*0.8)p(X_2=1,X_1)] \\ &= p(X_1)(0.56p(X_2=0,X_1)+0.72p(X_2=1,X_1)), \\ \rightarrow p(X_1=0,X_4=1)&=0.5(0.56*0.6+0.72*0.4)=0.312,\\ \rightarrow p(X_4=1)&=\sum_{X_1}p(X_1,X_4=1) \\ &= 0.5(0.56*(0.6+0.2)+0.72*(0.4+0.8))=0.656,\\ \rightarrow p(X_1=0|X_4=1)&=\frac{p(X_1=0,X_4=1)}{p(X_4=1)}\\ &=\frac{0.312}{0.656}= 0.552. \end{align} }[/math]

Problem 6

For the two graphs in Figure 1, determine which variables can be reached using the Bayes ball algorithm if we start at node A.

Fig.1 Bayes ball examples.
Fig.1 Bayes ball examples.


Solution 1

First graph: Starting from node A, the ball passes to nodes B & C. Node B blocks the ball, while node C passes the ball to nodes E & F. In turn, node E passes the ball to node G which passes the ball to nodes I & D. Node F passes the ball to node H which passes the ball to the already visited node, I. Therefore all nodes are visited.

More formally, we can say that the variables reached using the Bayes ball algorithm are:

A

B

C

E

F

G

H

D

I


Second graph: Again, if we start from node A, it passes the ball to node G. As G receives the ball from a parent, it can only pass the ball to its children, hence G passes the ball to J. J is shaded, therefore it bounces the ball back to its parent G. This time G receives the ball from a child and can pass it everywhere and then the ball is passed to node D which passes the ball to nodes B & H. Node H doesn't have children and can't pass the ball anywhere, while node B passes the ball to node E which passes the ball to node I. Node I doesn't have the children and this is where the Bayes ball stops.

We can now list the variables reached using the Bayes ball algorithm in the second graph as:

A

G

J

D

B

H

E

I