contributions on Quantifying Cancer Progression with Conjunctive Bayesian Networks

From statwiki
Jump to navigation Jump to search

Motivation

Tumor progression is characterized by a sequence of multiple genetic mutations that arise due to activation of oncogenes and inactivation of tumor suppressor genes. It is still unknown about the temporal order of these mutations, as carcinogenic process is a slow process and takes several years. Biologically motivated mathematical models such as "Evolutionary dynamics" have been used to describe the sequence of events. Several statistical models such as oncogenetic trees, network trees, probabilistic network models, etc have been used to model disease progression, in particular cancer. Genetic events happen in no specific order, due to which a single node can have multiple parents. This lead to the use of a more generalization framework for tree models called as the conjunctive Bayesian networks. In simple words, a conjunctive Bayesian network is a directed acyclic graph that allows for multiple parent nodes. One biggest advantage of this model is that it can model multiple mutational pathways.

Introduction

A poset [math]\displaystyle{ P }[/math] or a partially ordered set has a binary relation [math]\displaystyle{ \lt }[/math] which has the following properties:

  • Reflexive
  • Antisymmetry
  • Transitive

In this model, [math]\displaystyle{ P }[/math] denotes the set of mutational events and the binary relation defines the order of occurrence of the constraints. For example, [math]\displaystyle{ p \lt q }[/math] denotes that mutation [math]\displaystyle{ q }[/math] can occur only after the occurrence of mutation [math]\displaystyle{ p }[/math]. We denote [math]\displaystyle{ p }[/math] as the parent of [math]\displaystyle{ q }[/math] if there exists no node [math]\displaystyle{ r \in P }[/math] such that [math]\displaystyle{ r\neq p }[/math], [math]\displaystyle{ r\neq q }[/math] and [math]\displaystyle{ p\lt r\lt q }[/math]. Denote [math]\displaystyle{ p\rightarrow q }[/math] to say that [math]\displaystyle{ p }[/math] is the parent of [math]\displaystyle{ q }[/math]. The set of all parents is denoted by [math]\displaystyle{ pa(q) }[/math]. We now construct the distributive lattice of order ideals of [math]\displaystyle{ P }[/math] denoted by [math]\displaystyle{ J(P) }[/math]. The distributive lattice is defined as follows: all the subsets [math]\displaystyle{ S\subset P }[/math] and [math]\displaystyle{ S\in J(P) }[/math]. We say that [math]\displaystyle{ S\in J(P) }[/math] if and only if for all [math]\displaystyle{ q\in S }[/math] and [math]\displaystyle{ p \lt q }[/math] then [math]\displaystyle{ p\in S }[/math].

Notes

Beerenwinkel (one of the authors) previously put some assumptions and followed them when modelling the accumulative evolutionary process. Such assumptions are:

1. Substitutions do not occur independently. There are preferred evolutionary pathways in which mutations are fixed

2. The fixation mutations into the population is definite. This means that substitutions are non-reversible

3. At each time point, the virus population is dominated by a single strain and clones are independent and (sometimes erroneous) copies of this genotype

Improvements

As mentioned in the paper, an improvement on the proposed model would be to use different parameters [math]\displaystyle{ \varepsilon^+ }[/math] and [math]\displaystyle{ \varepsilon^- }[/math] for false positives and false negatives in the error model. Beerenwinkel and Drton have developed this idea.

Let [math]\displaystyle{ \varepsilon^+ = (\varepsilon_1^+,...,\varepsilon_M^+) \in [0, 1]^M }[/math] and [math]\displaystyle{ \varepsilon^- = (\varepsilon_1^-,...,\varepsilon_M^-) \in [0, 1]^M }[/math] be parameter vectors that contain the mutation specific probabilities of observing a false positive and a false negative respectively. False positives (negatives) are mutations observed in clones derived from a virus population that is in mutant state at such time point. The false positive and false negative negative rates summarize differences from the population state. Then, these parameters quantify the expected genetic diversity of the virus population. Conditionally upon the hidden state [math]\displaystyle{ X_{jm} }[/math], the probabilities of observing mutation [math]\displaystyle{ m }[/math] in clone [math]\displaystyle{ k }[/math] at time point [math]\displaystyle{ t_j }[/math] are as follows:

[math]\displaystyle{ \begin{matrix} \theta^l(\varepsilon_m^+, \varepsilon_m^-) = \begin{matrix} & 0 & 1\\ 0 & 1-\varepsilon_m^+ & \varepsilon_m^+\\ 1 & \varepsilon_m^- & 1-\varepsilon_m^- \end{matrix} \end{matrix} }[/math]

The entries of this matrix are the conditional probabilities

[math]\displaystyle{ \begin{matrix} \theta^l(\varepsilon_m^+, \varepsilon_m^-)_{x_{jm},y_{jkm}} = Prob(Y_{jkm}=y_{jkm}|X_{jm}=x_{jm}) \end{matrix} }[/math]

then the model is concluded accordingly.