From statwiki
Revision as of 19:06, 16 November 2015 by Ali.MSH (talk | contribs)
Jump to: navigation, search

Genetic Application of Deep Learning

This paper presentation is based on the paper [Hui Y. Xiong1 et al, Science 347, 2015] which reveals the importance of deep learning methods in genetic study of disease while using different types of machine-learning approaches would enable us to precise annotation mechanism. These techniques have been done for a wide variety of disease including different cancers which has led to important achievements in mutation-driven splicing. t reach to this goal, various intronic and exonic disease mutations have taken into account to detect variants of mutations. This procedure should enable us to prognosis, diagnosis, and/or control a wide variety of diseases.


It has been a while since whole-genome sequencing been used to detect the source of disease or unwanted malignancies genetically. The idea is to find a hierarchy of mutations tending to such diseases by looking at alterations via genetic variations in the genome and particularly when they occur outside of those domains in which protein-coding happens. In the present paper, a computational method is given to detect those genetic variants which influence RNA splicing. RNA splicing is a modification of pre-messenger RNA (pre-mRNA) when introns are removed and makes the exons joined. Any type of interruptions on this important step of gene expression would lead to various kind of disease such as cancers and neurological disorders.

Materials and Methods

The human splicing regulatory model is analyzed by Baysian machine learning method. 10,698 cassette exons has considered in this study as a training case. The goal is to maximize an information-theoretic code quality measure [math]CQ=\sum_e \sum_t D_{KL} (q_{t,e} | r_t ) - D_{KL} (q_{t,e} | p_{t,e} ) [/math] where [math]q_{t,e}[/math] is the target splicing pattern for exon in tissue t, [math] r_t [/math] is the optimized guesser's prediction ignoring possible RNA features, [math]p_{t,e}[/math] is the nontrained regulatory prediction on exons, and [math]D_{KL}[/math] is the Kullback-Leibler between two distributions. CQ is, in fact, a likelihood function of [math]p_{t,e} [/math].

The structure of each model is a two-layer neural network of units which are sigmoidal hidden within a considered tissue. In our special case study, nonlinear and texture-dependent correlation between the RNA features and the splicing has considered. In such a model, RNA features provide the inputs to 30 hidden variables at most. Each hidden variable is a sigmoidal non-linearity of its corresponding input. Then by applying a softmax function, the non-linear hidden variable are used to prepare the prediction. Moreover, tissues are also trained jointly as disjoint output units.

Genome-wide Analysis

Spinal Muscular Atropy




[1] Hui Y. Xiong1 et al, The human splicing code reveals new insights into the genetic determinants of disease, Science 347, 2015.