# Background

Gene regulation is the process of controlling which genes in a cell's DNA are turned 'on' (expressed) or 'off' (not expressed). By this process functional product such as a protein is created. Even though all the cells of a multicellular organism (e.g., humans) contain the same DNA, different types of cells in that organism may express very different sets of genes. As a result each cell types have distinct functionality. In other words how a cell operates depends upon the genes expressed in that cell. Many factors including ‘Chromatin modification marks’ influence which genes are abundant in that cell.

The function of chromatin is to efficiently wraps DNA around histones into a condensed volume to fit into the nucleus of a cell and protect the DNA structure and sequence during cell division and replication. Different chemical modifications in the histones of the chromatin, known as histone marks, changes spatial arrangement of the condensed DNA structure. Which in turn affects the gene’s expression of the histone mark’s neighboring region. Histone marks can promote (obstruct) the gene to be turned on by making the gene region accessible (restricted). This section of the DNA, where histone marks can potentially have an impact, is known as DNA flanking region or ‘gene region’ which is considered to cover 10k base pair centered at the transcription start site (TSS) (i.e., 5k base pair in each direction). Unlike genetic mutations, histone modifications are reversible [1]. Therefore, understanding influence of histone marks in determining gene regulation can assist in developing drugs for genetic diseases.

# Introduction

Revolution in genomic technologies now enables us to profile genome-wide chromatin mark signals. Therefore, biologists can now measure gene expressions and chromatin signals of the ‘gene region’ for different cell types covering whole human genome. The Roadmap Epigenome Project (REMC, publicly available) [2] recently released 2,804 genome-wide datasets of 100 separate “normal” (not diseased) human cells/tissues, among which 166 datasets are gene expression reads and the rest are signal reads of various histone marks. The goal is to understand which histone marks are the most important and how they interact together in gene regulation for each cell type.

Signal reads for histone marks are high-dimensional and spatially structured. Influence of a histone modification mark can be anywhere in the gene region (covering 10k bp centering the TSS). It is important to understand how the impact of the mark on gene expression varies over the gene region. In other words, how histone signals over the gene region impacts the gene expression. There are different types of histone marks in human chromatin that can have an influence on gene regulation. Researchers have found five standard histone proteins. These five histone proteins can be altered in different combinations with different chemical modifications resulting in a large number of distinct histone modification marks. Different histone modification marks can act as a module to interact with each other and influence the gene expression.

This paper proposes an attention-based deep learning model to find how this chromatin factors/ histone modification marks contributes to the gene expression of a particular cell. AttentiveChrome[3] utilizes a hierarchy of multiple LSTM to discover interactions between signals of each histone marks, and learn dependencies among the marks on expressing a gene. The authors included two levels of soft attention mechanism, (1) to attend to the most relevant signals of a histone mark, and (2) to attend to the important marks and their interactions.

## Main Contributions

The contributions of this work can be summarized as follows:

• More accurate predictions than the state-of-the-art baselines. This is measured using datasets from REMC on 56 different cell types.
• Better interpretation than the state-of-the-art methods for visualizing deep learning model. They compute the correlation of the attention scores of the model with the mark signal from REMC.
• Like the application of attention models previously in indirectly hinting the parts of the input that the model deemed important, AttentiveChrome can too explain it's decisions by hinting at “what” and “where” it has focused.
• This is the first time that the attention based deep learning approach is applied to a problem in molecular biology.

# Previous Works

Machine learning algorithms to classify gene expression from histone modification signals have been surveyed by [15]. These algorithms varies from linear regression, support vector machine, and random forests to rule-based learning, and CNNs. To accommodate the spatially structured, high dimensional input data (histone modification signals) these studies applied different feature selection strategies. The preceding research study, DeepChrome [4], by the authors incorporated the best position selection strategy. The positions that are highly correlated to the gene expression are considered as the best positions. This model can learn the relationship between the histone marks. This CNN based DeepChrome model outperforms all the previous works. However, these approaches either (1) failed to model the spatial dependencies among the marks, or (2) required additional feature analysis. Only AttentiveChrome is reported to satisfy all of the eight desirable metrics of a model.

# AttentiveChrome: Model Formulation

The authors proposed an end-to-end architecture which has the ability to simultaneously attend and predict. This method incorporates recurrent neural networks (RNN) composed of LSTM units to model the sequential spatial dependencies of the gene regions and predict gene expression level from The embedding vector, $h_t$, output of an LSTM module encodes the learned representation of the feature dependencies from the time step 0 to $t$. For this task, each bin position of the gene region is considered as a time step.

The proposed AttentiveChrome framework contains following 5 important modules:

• Bin-level LSTM encoder encoding the bin positions of the gene region (one for each HM mark)
• Bin-level $\alpha$-Attention across all bin positions (one for each HM mark)
• HM-level LSTM encoder (one encoder encoding all HM marks)
• HM-level $\beta$-Attention among all HM marks (one)
• The final classification module

Figure 1 (Supplementary Figure 2) presents the overview of the proposed AttentiveChrome framework.

## Input and Output

Each dataset contains the gene expression labels and the histone signal reads for one specific cell type. The authors evaluated AttentiveChrome on 56 different cell types. For each mark, we have a feature/input vector containing the signals reads surrounding the gene’s TSS position (gene region) for the histone mark. The label of this input vector denotes the gene expression of the specific gene. This study considers binary labeling where $+1$ denotes gene is expressed (on) and $-1$ denotes that the gene is not expressed (off). Each histone marks will have one feature vector for each gene. The authors integrates the feature inputs and outputs of their previous work DeepChrome [4] into this research. The input feature is represented by a matrix $\textbf{X}$ of size $M \times T$, where $M$ is the number of HM marks considered in the input, and $T$ is the number of bin positions taken into account to represent the gene region. The $j^{th}$ row of the vector $\textbf{X}$, $x_j$, represents sequentially structured signals from the $j^{th}$ HM mark, where $j\in \{1, \cdots, M\}$. Therefore, $x_j^t$, in the matrix $\textbf{X}$ represents the value from the $t^{th}$ bin belonging to the $j^{th}$ HM mark, where $t\in \{1, \cdots, T\}$. If the training set contains $N_{tr}$ labeled pairs, the $n^{th}$ is specified as $( X^n, y^n)$, where $X^n$ is a matrix of size $M \times T$ and $y^n \in \{ -1, +1 \}$ is the binary label, and $n \in \{ 1, \cdots, N_{tr} \}$.

Figure 2 exhibits the input feature, and the output of AttentiveChrome for a particular gene (one sample).

## Bin-Level Encoder (one LSTM for each HM)

The sequentially ordered elements (each element actually is a bin position) of the gene region of $n^{th}$ gene is represented by the $j_{th}$ row vector $x^j$. The authors considered each bin position as a time step for LSTM. This study incorporates bidirectional LSTM to model the overall dependencies among a total of $T$ bin positions in the gene region. The bidirectional LSTM contains two LSTMs

• A forward LSTM, $\overrightarrow{LSTM_j}$, to model $x^j$ from $x_1^j$ to $x_T^j$, which outputs the embedding vector $\overrightarrow{h^t_j}$, of size $d$ for each bin $t$
• A reverse LSTM, $\overleftarrow{LSTM_j}$, to model $x^j$ from $x_T^j$ to $x_1^j$, which outputs the embedding vector $\overleftarrow{h^j_t}$, of size $d$ for each bin $t$

The final output of this layer, embedding vector at $t^{th}$ bin for the $j^{th}$ HM, $h^j_t$, of size $d$, is obtained by concatenating the two vectors from the both directions. Therefore, $h^j_t = [ \overrightarrow{h^j_t}, \overleftarrow{h^j_t}]$.

## Bin-Level $\alpha$-attention

Each bin contributes differently in the encoding of the entire $j^{th}$ mark. To highlight the most important bins for prediction a soft attention weight vector $\alpha^j$ of size $T$ is learned for each $j$. To calculated the soft weight $\alpha^j_t$, for each $t$, the embedding vectors $\{h^j_1, \cdots, h^j_t \}$ of all the bins are utilized. The following equation is used:

$\alpha^j_t = \frac{exp(\textbf{W}_b h^j_t)}{\sum_{i=1}^T{exp(\textbf{W}_b h^j_i)}}$

The parameter $W_b$ is learned alongside during the process. Therefore, the $j^{th}$ HM mark can be represented by $m^j = \sum_{t=1}^T{\alpha^j_t \times h^j_t}$. Here, $h^j_t$ is the embedding vector and $\alpha^t_j$ is the importance weight of the $t^{th}$ bin in the representation of the $j^{th}$ HM mark. Intuitively $\textbf{W}_b$ will learn the cell type.

## HM-level Encoder (one LSTM)

Studies observed that HMs work cooperatively to provoke or subdue gene expression [5]. The HM-level encoder utilizes one bidirectional LSTM to capture this relationship between the HMs. To formulate the sequential dependency a random sequence is imagined as the authors did not find influence of any specific ordering of the HMs. The representation $m_j$of the $j^{th}$ HM, $HM_j$, which is calculated from the bin-level attention layer, is the input of this step. This set based encoder outputs an embedding vector $s^j$ of size $d’$, which is the encoding for the $j^{th}$ HM.

$s^j = [ \overrightarrow{LSTM_s}(m_j), \overleftarrow{LSTM_s}(m_j) ]$

The dependencies between $j^{th}$ HM and the other HM marks are encoded in $s^j$, whereas $m^j$ from the previous step encodes the bin dependencies of the $j^{th}$ HM.

HM-Level $\beta$-attention This second soft attention level finds the important HM marks for classifying a gene’s expression by learning the importance weights, $\beta_j$, for each $HM_j$, where $j \in \{ 1, \cdots, M \}$. The equation is

$\beta^j = \frac{exp(\textbf{W}_s s^j)}{\sum_{i=1}^M{exp(\textbf{W}_s s^j)}}$

The HM-level context parameter $\textbf{W}_s$ is trained jointly in the process. Intuitively $\textbf{W}_s$ learns how the HMs are significant for a cell type. Finally the entire gene region is encoded in a hidden representation $\textbf{v}$, using the weighted sum of the embedding of all HM marks.

$\textbf{v} = \sum_{j=1}^MT{\beta^j \times s^j}$

## End-to-end training

The embedding vector $\textbf{v}$ is fed to a simple classification module, $f(\textbf{v}) =$softmax$(\textbf{W}_c\textbf{v}+b_c)$, where $\textbf{W}_c$, and $b_c$ are learnable parameters. The output is the probability of gene expression being high (expressed) or low (suppressed). The whole model including the attention modules are differentiable. Thus backpropagation can perform end-to-end learning trivially. Negative log-likelihood loss function is minimized in the learning.

# Related Works/Studies

In the last few years, deep learning models obtained models obtained unprecedented success in diverse research fields. Though as not rapidly as other fields, deep learning based algorithms are gaining popularity among bioinformaticians.

## Attention-based Deep Models

The idea of attention technique in deep learning is adapted from human visual perception system. Humans tend to focus over some parts more than the others while perceiving a scene. This mechanism augmented with deep neural networks achieved excellent outcome in several research topics. Various types of attention models e.,g., soft [6], or location aware [7], or hard [8, 9] attentions have been proposed in the literature. In the soft attention model a soft weight vector is calculated for the overall feature vectors. The extent of the weight is correlated with the degree of importance of the feature in the prediction.

## Visualization and Apprehension of Deep Models

Prior studies mostly focused on interpreting convolutional neural networks (CNN) for image classification through deconvulation [10], saliency map [11, 12], and class optimization [12] based visualisation techniques. Some recent research works [13, 14] tried to understand recurrent neural networks (RNN) for text-based problems. By looking into the features the model attends to we can interpret the output of a deep model.

## Conclusion

The paper has introduced an attention-based approach called "AttentiveChrome" that deals with both understanding and prediction with several advantages on previous architectures including higher accuracy from state-of-the-art baselines, clearer interpretation than saliency map and class optimization. Finally, according to the authors, this is the first implementation of deep attention to understand gene regulation.

# Reference

[1] Andrew J Bannister and Tony Kouzarides. Regulation of chromatin by histone modifications. Cell research, 21(3):381–395, 2011.

[2] Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Michael J Ziller, et al. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317–330, 2015.

[3] Singh, Ritambhara, et al. "Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin." Advances in Neural Information Processing Systems. 2017.

[4] Ritambhara Singh, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. Deepchrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics, 32(17):i639–i648, 2016.

[5] Joanna Boros, Nausica Arnoult, Vincent Stroobant, Jean-François Collet, and Anabelle Decottignies. Polycomb repressive complex 2 and h3k27me3 cooperate with h3k9 methylation to maintain heterochromatin protein 1α at chromatin. Molecular and cellular biology, 34(19):3662–3674, 2014.

[6] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

[7] Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 577–585. Curran Associates, Inc., 2015.

[8] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics.

[9] Huijuan Xu and Kate Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016.

[10] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014, pages 818–833. Springer, 2014.

[11] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÃžller. How to explain individual classification decisions. volume 11, pages 1803–1831, 2010.

[12] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. 2013.

[13] Andrej Karpathy, Justin Johnson, and Fei-Fei Li. Visualizing and understanding recurrent networks. 2015.

[14] Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. Visualizing and understanding neural models in nlp. 2015.

[15] Xianjun Dong and Zhiping Weng. The correlation between histone modifications and gene expression. Epigenomics, 5(2):113–116, 2013.