f11Stat946ass

From statwiki
Revision as of 00:27, 12 October 2011 by Fewzee (talk | contribs) (→‎solution 1)
Jump to navigation Jump to search

problem 1

Recall the definition of the family of probability distributions associated with a directed graphical model:

[math]\displaystyle{ p(x) = \prod_{i=1}^{n}f_i(x_i,x_{\pi_i}). }[/math]

Prove by induction that [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math].

solution 1

According to the product rule we have:

[math]\displaystyle{ p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}).~\sharp }[/math]

This is the most general case for a directed graph, as we can represent each and every graphical model with a fully connected graph.

Now, we need to show that this statement, [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math], holds for [math]\displaystyle{ n=1 }[/math] (and [math]\displaystyle{ n=2 }[/math], just for the sake of more clarity).

[math]\displaystyle{ \begin{align} n=&1:~p(x)=p(x_1)=p(x_1|\emptyset)=f_1(x_1,x_{\pi_1}), \mbox{ where } x_{\pi_1}=\emptyset \rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}),\\ n=&2:~p(x)=p(x_1,x_2)=p(x_1)p(x_2|x_1)=p(x_1|\emptyset)p(x_2|x_1)=f_1(x_1,x_{\pi_2})f_2(x_2,x_{\pi_2})\rightarrow p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}). \end{align} }[/math]

Where in the latter case, without loss of generality, we have assumed that [math]\displaystyle{ x_1 }[/math] is the only parent of [math]\displaystyle{ x_2 }[/math]. This assumption that we've taken, not only does not reduce from the generality of the proof (as we have initially assumed that the graph is directed and in a directed graph one of two connected nodes/random variables should be a parent of the other), but it also abides by our earlier assumption, that parents come before children. This leaves no parents for [math]\displaystyle{ x_1 }[/math], similar to the former case in the above.

Now, assuming that the statement [math]\displaystyle{ p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i}) }[/math] holds for [math]\displaystyle{ n=k }[/math],

[math]\displaystyle{ \begin{align} n=k: p(x)=p(x_1,x_2,\ldots,x_k)=\prod_{i=1}^{k}f_i(x_i,x_{\pi_i})=\prod_{i=1}^{k}p(x_i|x_{\pi_i}).~\flat \end{align} }[/math]

we need to show that it also holds for [math]\displaystyle{ n=k+1 }[/math].

[math]\displaystyle{ \begin{align} n=k+1: p(x)&=p(x_1,x_2,\ldots,x_{k+1})=\prod_{i=1}^{k+1}f_i(x_i,x_{\pi_i}) \\ &=f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}f_i(x_i,x_{\pi_i}) \\ &\stackrel{\flat}{=}f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i}). \end{align} }[/math]

On the other hand according to [math]\displaystyle{ \sharp }[/math] we have:

[math]\displaystyle{ \begin{align} p(x_1,\ldots,x_{k+1})=p(x_1,\ldots,x_{k})p(x_{k+1}|x_1,\ldots,x_{k}) \end{align} }[/math]

Thus,

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})\prod_{i=1}^{k}p(x_i|x_{\pi_i})=p(x_{k+1}|x_1,\ldots,x_{k})p(x_1,\ldots,x_{k}). \end{align} }[/math]

Once more using [math]\displaystyle{ \flat }[/math], we have:

[math]\displaystyle{ \begin{align} f_{k+1}(x_{k+1},x_{\pi_{k+1}})=p(x_{k+1}|x_1,\ldots,x_{k}), \end{align} }[/math]

and so:

[math]\displaystyle{ \begin{align} p(x_i|x_{\pi_i})=f_i(x_i,x_{\pi_i})~\triangle. \end{align} }[/math]

problem 2

For a given variable [math]\displaystyle{ X_i }[/math] in a graphical model, what is the minimal set of nodes that renders [math]\displaystyle{ X_i }[/math] conditionally independent of all of the other variables? That is, what is the smallest set [math]\displaystyle{ C }[/math], such that [math]\displaystyle{ X_i\perp X_{V\setminus \{i\}\cup C}|X_C }[/math]? (Note that [math]\displaystyle{ V }[/math] is the set of all nodes in the graph, so that [math]\displaystyle{ V\setminus \{i\}\cup C }[/math] is the set of all nodes in the graph, excluding [math]\displaystyle{ C }[/math] and [math]\displaystyle{ i }[/math]).

  • Do the problem for an undirected graph.
  • Do the problem for a directed graph.

solution 1

a. As in directed graphs, the relationship between each two nodes, [math]\displaystyle{ X_i }[/math] and [math]\displaystyle{ X_j }[/math], can be of one of the child-parent or parent-child forms. Let's assume that we show the set of parents of the node [math]\displaystyle{ X_i }[/math] like this: [math]\displaystyle{ X_{\pi_i} }[/math]. So, a general node can be thought of as in Fig. 1. In this figure, all the possible relationships for the node [math]\displaystyle{ X_4 }[/math] and its immediate neighbours and those of the depth one has been shown.

File:ass1-2a.png
Fig.1 A directed graph!

From the graph in the figure we have:

[math]\displaystyle{ \begin{align} p(x) = p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_2)p(X_6|X_4,X_5)p(X_7|X_6)\\ \end{align} }[/math]

So, we can derive:

[math]\displaystyle{ \begin{align} p(X_1,X_2,X_4)&=p(X_1)p(X_2|X_1)p(X_4|X_2)\sum_{X_3}p(X_3|X_2)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_1)p(X_2|X_1)p(X_4|X_2)\\ \rightarrow p(X_4|X_1,X_2) &= \frac{p(X_1,X_2,X_4)}{p(X_1,X_2)}=\frac{p(X_1)p(X_1,X_2)p(X_4,X_2)}{p(X_1,X_2)p(X_1)p(X_2)}=p(X_4|X_2)~\rightarrow~X_4 \perp X_1 | X_2. \end{align} }[/math]

Also we have:

[math]\displaystyle{ \begin{align} p(X_2,X_3,X_4)&=p(X_4|X_2)p(X_3|X_2)\sum_{X_1}p(X_1)p(X_2|X_1)\sum_{X_5}\sum_{X_6}p(X_6|X_4,X_5)\sum_{X_7}p(X_7|X_6)\\ &=p(X_4|X_2)p(X_3|X_2)p(X_2)\\ \rightarrow p(X_4,X_3|X_2) &= \frac{p(X_2,X_3,X_4)}{p(X_2)}=\frac{p(X_4|X_2)p(X_3|X_2)p(X_2)}{p(X_2)}=p(X_4|X_2)p(X_3|X_2)~\rightarrow~X_4 \perp X_3 | X_2. \end{align} }[/math]

So, by induction we can say that given the parents of a node, it will be conditionally independent of all of the nodes that come before. It is assumed that parents always come before children.

Now, if we check the independence of the nodes, from those there is a path to [math]\displaystyle{ X_4 }[/math] -in general- but not through parents of [math]\displaystyle{ X_4 }[/math], we will have:

[math]\displaystyle{ \begin{align} X_4 \perp X_7 | X_6,\\ X_4 \cancel{\perp} X_5 | X_6. \end{align} }[/math]

And if the children of [math]\displaystyle{ X_4 }[/math] are not given, we will have:

[math]\displaystyle{ \begin{align} X_4 \cancel{\perp} X_7,\\ X_4 \perp X_5. \end{align} }[/math]

So, as of the set [math]\displaystyle{ C }[/math], we can state:

[math]\displaystyle{ C_{X_i}=\pi_{X_i}\cup\{k,\pi_{X_k}|i\in \pi_{X_k}\} }[/math]

b. As of an undirected graph, we will have:

[math]\displaystyle{ C_{X_i}=\{j|j\mbox{ is an immediate neighbour of }i\} }[/math]

problem 3

Consider the directed graphical model below.

Fig.2 problem 3 - graphical model
Fig.2 problem 3 - graphical model

(a) Moralize the graph, and show the resulting undirected graphical model.

(b) Invoke UndirectedGraphEliminate on your moral graph, using the elimination ordering (7, 8, 9, 6, 3, 5, 4, 2, 1), and show the resulting reconstituted graph (the graph that includes all the edges added during the elimination process). You do not have to show the intermediate steps of the algorithm.

(c) Report (b), but using the elimination ordering (7, 6, 8, 9, 4, 3, 2, 5, 1).

(d) Which of the two elimination orderings, if used in Eliminate to calculate [math]\displaystyle{ p(x_1|x_7) }[/math], do you think would result in a more time-efficient calculation? Which would result in a more space-efficient calculation? Briefly explain why.

solution 1

(a) Here is the resulting moralized undirected graph (Fig. 3).

Fig.3 moralized undirected graphical model
Fig.3 moralized undirected graphical model

(b) Here is the resulting reconstituted graph, after a (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination. (Fig. 4)

Fig.4 Resulting reconstituted graph after (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination.
Fig.4 Resulting reconstituted graph after (7, 8, 9, 6, 3, 5, 4, 2, 1) order elimination.

(c) Here is the resulting reconstituted graph, after a (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination. (Fig. 5)

Fig.5 Resulting reconstituted graph after (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination.
Fig.5 Resulting reconstituted graph after (7, 6, 8, 9, 4, 3, 2, 5, 1) order elimination.

(d) It is more computationally efficient to remove those nodes, which have less number of edges, as it takes less number of summations over their neighbours. So, starting with single-edge nodes would be a good idea, which makes the first ordering more computationally efficient. (What does space-efficient means, though?)

problem 4

Consider an undirected tree. We know that it is not possible in genereal to parameterize a distribution on a tree using marginal probabilities as the potentials. It is, however, possible to parameterize such a distribution using ratios of marginal probabilities. In particular let:

[math]\displaystyle{ \begin{align} \psi(x_i)&=p(x_i),\\ \psi(x_i,x_j)&=\frac{p(x_i,x_j)}{p(x_i)p(x_j)}. \end{align} }[/math]

where [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] are neighbors in the tree, where [math]\displaystyle{ p(x_i) }[/math] and [math]\displaystyle{ p(x_i,x_j) }[/math] are a consistent set of marginal probabilities. Show that this setting of potential yields a parameterization of a joint probability distribution on the tree under which [math]\displaystyle{ p(x_i) }[/math] and [math]\displaystyle{ p(x_i,x_j) }[/math] are marginals. What is [math]\displaystyle{ Z }[/math] under this parameterization?

solution 1

Based on the definition of probability function which has to satisfy the following:

[math]\displaystyle{ \begin{align} 0\le p(x) \le 1,~p(x)\in R,\mbox{ and},~\sum_{x}p(x)=1, \end{align} }[/math]

one can prove that the proposed potential function, [math]\displaystyle{ \psi }[/math], is non-negative and is real-valued. These two are all we need for a potential function and besides these, it can be defined totally arbitrarily, while it meets the independence characteristics of the random variables.

According to the given potential function and the definition of [math]\displaystyle{ p(x) }[/math], probability distribution function for an undirected graphical model, here is how one can define the joint probability distribution:

[math]\displaystyle{ \begin{align} p(x)&=\frac{1}{Z}\prod_{x_k\in X}p(x_k)\prod_{c\in C}\prod_{x_i,x_j\in c}\frac{p(x_i,x_j)}{p(x_i)p(x_j)},\\ Z &= \sum_{x}\prod_{x_k\in X}p(x_k)\prod_{c\in C}\prod_{x_i,x_j\in c}\frac{p(x_i,x_j)}{p(x_i)p(x_j)}. \end{align} }[/math]

where [math]\displaystyle{ C }[/math] is the set of all maximal cliques in the tree and [math]\displaystyle{ i\neq j }[/math].

problem 5

In this problem we will work through the mechanics of using Eliminate for inference in a directed graphical model. Consider the directed graphical model below over the binary variables [math]\displaystyle{ X_1,X_2,X_3, }[/math] and [math]\displaystyle{ X_4 }[/math].

Fig.6 problem 5
Fig.6 problem 5

Let [math]\displaystyle{ p(X_1=0)=p(X_1=1)=0.5 }[/math]. For the local conditional probabilities, [math]\displaystyle{ p(x_{i+1}|x_i) }[/math], use the following matrix.

We will use eliminate to compute [math]\displaystyle{ p(X_1=0|X_4=1) }[/math].

(a) Write out the summations involved in carrying out the elimination algorithm to find [math]\displaystyle{ p(x_1,x_4) }[/math] and [math]\displaystyle{ p(x_4) }[/math], as in Equations (3.8) and (3.15) in Chapter 3 of the text. Use the elimination ordering (4,3,2,1).

(b) Using eliminate (with [math]\displaystyle{ x_4=1 }[/math]), write down (as tables) the resulting intermediate terms, and [math]\displaystyle{ p(x_1,\bar{x}_4) }[/math].

(c) Compute [math]\displaystyle{ p(X_1=0|X_4=1) }[/math].

(d) You can alternatively do parts (a), (b), and (c) using Matlab or Splus.

solution 1

[math]\displaystyle{ \begin{align} p(x) &= p(X_1)p(X_2|X_1)p(X_3|X_2)p(X_4|X_3). \\ \rightarrow p(X_1,X_4=1)&=p(X_1)\sum_{X_2}p(X_2|X_1)\sum_{X_3}p(X_3|X_2)p(X_4=1|X_3) \\ &= p(X_1)\sum_{X_2}p(X_2|X_1)\underbrace{(0.4p(X_3=0,X_2)+0.8p(X_3=1,X_2))}_{m_3(x_2)} \\ &= p(X_1)\underbrace{[(0.4*0.6+0.8*0.4)p(X_2=0,X_1)+(0.4*0.2+0.8*0.8)p(X_2=1,X_1)]}_{m_2(x_1)} \\ &= p(X_1)(0.56p(X_2=0,X_1)+0.72p(X_2=1,X_1)), \\ \rightarrow p(X_1=0,X_4=1)&=0.5(0.56*0.6+0.72*0.4)=0.312,\\ \rightarrow p(X_4=1)&=\sum_{X_1}p(X_1,X_4=1) \\ &= 0.5(0.56*(0.6+0.2)+0.72*(0.4+0.8))=0.656,\\ \rightarrow p(X_1=0|X_4=1)&=\frac{p(X_1=0,X_4=1)}{p(X_4=1)}\\ &=\frac{0.312}{0.656}= 0.552. \end{align} }[/math]

Problem 6

For the two graphs in Figure 1, determine which variables can be reached using the Bayes ball algorithm if we start at node A.

Fig.1 Bayes ball examples.
Fig.1 Bayes ball examples.


Solution 1

First graph: Starting from node A, the ball passes to nodes B & C. Node B blocks the ball, while node C passes the ball to nodes E & F. In turn, node E passes the ball to node G which passes the ball to nodes I & D. Node F passes the ball to node H which passes the ball to the already visited node, I. Therefore all nodes are visited.

More formally, we can say that the variables reached using the Bayes ball algorithm are:

A

B

C

E

F

G

H

D

I


Second graph: Again, if we start from node A, it passes the ball to node G. As G receives the ball from a parent, it can only pass the ball to its children, hence G passes the ball to J. J is shaded, therefore it bounces the ball back to its parent G. This time G receives the ball from a child and can pass it everywhere and then the ball is passed to node D which passes the ball to nodes B & H. Node H doesn't have children and can't pass the ball anywhere, while node B passes the ball to node E which passes the ball to node I. Node I doesn't have the children and this is where the Bayes ball stops.

We can now list the variables reached using the Bayes ball algorithm in the second graph as:

A

G

J

D

B

H

E

I