Bayesian Network as a Decision Tool for Predicting ALS Disease: Difference between revisions

Latest revision as of 10:31, 17 May 2022

Presented by

Bsodjahi

Introduction

In order to propose the best decision tool for Amyotrophic Lateral Sclerosis (ALS) prediction, Hasan Aykut et al. presents in this paper a comparative empirical study of the predictive performance of 8 supervised Machine Learning classifiers, namely, Bayesian Networks, Artificial Neural Networks, Logistic Regression, Naïve Bayes, J48, Support Vector Machines, KStar, and K-Nearest Neighbor. With a dataset consisting of blood plasma protein level and independent personal features, for each classifier they predicted ALS patients and found that Bayesian Networks offer the best results based on various metrics such as accuracy (88.7%) and 97% for Area Under the Curve (AUC).

Our summary of the paper commences with the review of previous works underpinning its motivation, next we present the dataset and the methodological approach that the authors used, then we analyze the results which is finally followed by the conclusion.

Previous Work and Motivation

ALS is a nervous system disease that progressively affects brain nerve cells and spinal cord, impacting the patient's upper and lower motor autonomy in the loss of muscle control. Its origin is still unknown, though in some instances it is thought to be hereditary. Sadly, at this point of time, it is not curable and the progressive degeneration of the muscles cannot be halted once started [1] and inexorably results in patient's passing away within 2-5 years [2].

The symptoms that ALS patients exhibit are not distinctively unique to ALS, as they are similar to a host of other neurological disorders. Furthermore, because the impact on patient's motor skill is usually not noticeable in early stage [3], diagnosis at that time is a challenge. One of the main diagnosis protocols, known as El Escorial criteria, involves a battery of tests taking 3-6 months. This is a considerable amount of time since a quicker diagnosis would allow earlier medical monitoring, conducive to patient's life conditions improvement with the possibility of an extended survival.

Given the need of a more timely but effective diagnosis, the authors of this paper proposed to bring to contribution the application of Machine Learning to identify among a list of candidates, the approach that yields the most accurate prediction.

Dataset

The table below shows an overview of the patient's features for the data which were collected in a prior experimental research. There are overall 204 data points of which about 50% are from ALS patients and the rest consists of Parkinson's patients, Neurological Control group patients, and also healthy participants Control group.

Study Methods

Figure 1 below shows the global architecture of the modelling process in the comparative machine learning performance study. Though all parts are important, the last two are much more of interest to us here and we focus on those.

Figure 1: Modelling process with machine learning methods

We provide here an overview of Bayesian Networks, since in the setting of our course it was not covered. Bayesian Networks are graph based statistical models that represent probabilistic relationships among variables and are mathematically formulated as:

They are also referred to as Directed Acyclic Graphs (DAG), i.e., graphs composed of nodes representing variables and arrows representing the direction of the dependency. BN are easily interpretable, especially for non-technical audience and are well adopted in the areas of biology and Medicine [4-5].

Generally, the choice of an algorithm is informed by the nature of the dataset which also suggests the most appropriate performance evaluation criteria of the technique. For instance, the dataset in this study is characterized on one hand by 4 classes (vs. the typical 2-class) and on the other by the unbalance in the number of participants in each group. Because of this latter characteristic, Hasan Aykut et al. included in their evaluation criteria the Geometric Mean and the Youden' index, since these 2 are known to resist the impact of unbalanced data on the performance evaluation. Table 2 below shows the evaluation criteria formulae.

Results Presentation

The resulting Bayesian Network from the study is presented below and visually it shows the dependency of the class prediction, i.e., Patient Type, on all the features, some of which have dependency among themselves as well. For example, we can see that the number of patience is dependent on age.

Figure 2: Bayesian Network model of the dataset

Table 3 summarizes the performance comparison of all 8 techniques through the lens of the 11 evaluation criteria and Bayesian Network comes top in all criteria, including 88.2% on Geometric Mean, 88.3% on Youden's index, 88.7% on accuracy and 97% with the weighted ROC.

Conclusion and Critiques

In this study, Hasan Aykut et al. conducted a technical study to evaluate and determine among 8 algorithms which is the best decision tool for ALS prediction. Eleven evaluation criteria were used and Bayesian Network produced the best results with among other measures, 88.7% on Accuracy and 97% on ROC that is more adequate for unbalanced datasets. In addition to achieving the best predictive performance in this study, Bayesian Networks are also easily interpretable by non-technical audience, justifying its adoption in domains such as biology and medicine.

From our perspective, given the nature of the dataset, which not only is unbalanced but also has 4 different classes, we would suggest Convolutional Neural Network as another method to explore further accuracy improvement though interpretability in that case would be more of a challenge. Because to date its origin is still unknown and very difficult to diagnose in early stage, it is a very important area of research that should rally more members of the Machine Learning and Artificial Intelligence community into further collaboration with the medical community in order to develop a more effective and rapid diagnostic decision tool for the detection of this neurological disorder at its inception as this could contribute to determining its origin and therefore allows for the development of preventative and curative measures.

References

[1] Rowland, L.P.; Shneider, N.A. Amyotrophic Lateral Sclerosis. N. Engl. J. Med. 2001, 344, 1688–1700. [CrossRef] [PubMed]

[2] Hardiman, O.; van den Berg, L.H.; Kiernan, M.C. Clinical Diagnosis and Management of Amyotrophic Lateral Sclerosis. Nat. Rev. Neurol. 2011, 7, 639–649. [CrossRef] [PubMed].

[3] Swinnen, B.; Robberecht, W. The Phenotypic Variability of Amyotrophic Lateral Sclerosis. Nat. Rev. Neurol. 2014, 10, 661. [CrossRef] [PubMed]

[4] Bandyopadhyay, S.; Wolfson, J.; Vock, D.M.; Vazquez-Benitez, G.; Adomavicius, G.; Elidrisi, M.; Johnson, P.E.; OConnor, P.J. Data Mining for Censored Time-to-Event Data: A Bayesian Network Model for Predicting Cardiovascular Risk from Electronic Health Record Data. Data Min. Knowl. Discov. 2015, 29, 1033–1069. [CrossRef]

[5] Kanwar, M.K.; Lohmueller, L.C.; Kormos, R.L.; Teuteberg, J.J.; Rogers, J.G.; Lindenfeld, J.; Bailey, S.H.; McIlvennan, C.K.; Benza, R.; Murali, S.; et al. A Bayesian Model to Predict Survival after Left Ventricular Assist Device Implantation. JACC Heart Fail. 2018, 6, 771–779. [CrossRef] [PubMed]

@@ Line 3: / Line 3: @@
 == Introduction ==
-In order to propose the best decision tool for Amyotrophic Lateral Sclerosis (ALS) prediction, Hasan Aykut et al. presents in the paper a comparative empirical study of the  predictive performance of 8 supervised Machine Learning classifiers, namely, Bayesian Network, Artificial Neural Networks, Logistic Regression, Naïve Bayes, J48, Support Vector Machines, KStar, and K-Nearest Neighbor. With a dataset consisting of blood plasma protein level and independent personal features, for each classifier they predicted ALS patients and found that Bayesian Network offers the best results based on various metrics such as accuracy (88.7%) and 97% for Area Under the Curve (AUC).
+In order to propose the best decision tool for Amyotrophic Lateral Sclerosis (ALS) prediction, Hasan Aykut et al. presents in this paper a comparative empirical study of the  predictive performance of 8 supervised Machine Learning classifiers, namely, Bayesian Networks, Artificial Neural Networks, Logistic Regression, Naïve Bayes, J48, Support Vector Machines, KStar, and K-Nearest Neighbor. With a dataset consisting of blood plasma protein level and independent personal features, for each classifier they predicted ALS patients and found that Bayesian Networks offer the best results based on various metrics such as accuracy (88.7%) and 97% for Area Under the Curve (AUC).
-Our summary of the paper commences with the review of previous works underpinning its motivation, next we discuss dataset and the approach the authors used, then we analyze the results which is  finally followed by the conclusion.
+Our summary of the paper commences with the review of previous works underpinning its motivation, next we present the dataset and the methodological approach that the authors used, then we analyze the results which is  finally followed by the conclusion.
 == Previous Work and Motivation ==
@@ Line 11: / Line 11: @@
 ALS is a nervous system disease that progressively affects brain nerve cells and spinal cord, impacting the patient's upper and lower motor autonomy in the loss of muscle control. Its origin is still unknown, though in some instances it is thought to be hereditary. Sadly, at this point of time, it is not curable and the progressive degeneration of the muscles cannot be halted once started [1] and inexorably results in patient's passing away within 2-5 years [2].
-The symptoms that ALS patients exhibit are not distinctively unique to ALS, as they are similar to a host of other neurological disorders. Furthermore, because the impact on patient's motor skill is usually not noticeable in early stage [3], diagnosis at that time is a challenge. One of the main diagnosis protocols, known as El Escorial criteria, involves a battery of tests taking 3-6 months. This is a considerable amount of time since a quicker diagnosis would allow earlier medical monitoring, conducive to patient's life conditions improvement with the possibility ofan extended survival.
+The symptoms that ALS patients exhibit are not distinctively unique to ALS, as they are similar to a host of other neurological disorders. Furthermore, because the impact on patient's motor skill is usually not noticeable in early stage [3], diagnosis at that time is a challenge. One of the main diagnosis protocols, known as El Escorial criteria, involves a battery of tests taking 3-6 months. This is a considerable amount of time since a quicker diagnosis would allow earlier medical monitoring, conducive to patient's life conditions improvement with the possibility of an extended survival.
 Given the need of a more timely but effective diagnosis, the authors of this paper proposed to bring to contribution the application of Machine Learning to identify among a list of candidates, the approach that yields the most accurate prediction.
 == Dataset ==
-The table below shows the characteristics of the patient for which data was collected.
+The table below shows an overview of the patient's features for the data which were collected in a prior experimental research. There are overall 204 data points of which about 50% are from ALS patients and the rest consists of Parkinson's patients, Neurological Control group patients, and also healthy participants Control group.
-The Inception architecture consists of stacking blocks called the inception modules.
+[[File:Table 1.png|center]]
+== Study Methods ==
-[[File:Table 1.png]]
+Figure 1 below shows the global architecture of the modelling process in the comparative machine learning performance study. Though all parts are important, the last two are much more of interest to us here and we focus on those.
+[[File:Aykut et al Figure 1.png|center]]
-The idea is that to increase the depth and width of model by finding local optimal sparse structure and repeating it spatially. Traditionally, in each layer of convolutional network pooling operation and convolution and its size (1 by 1, 3 by 3 or 5 by 5) should be decided while all of them are beneficial for the modeling power of the network. Whereas, in Inception module instead of choosing, all these various options are computed simultaneously (Fig. 1a). Inspired by layer-by-layer construction of Arora et al. [3], in Inception module statistics correlation of the last layer is analyzed and clustered into groups of units with high correlation. These clusters form units of next layer and are connected to the units of previous layer. Each unit from the earlier layer corresponds to some region of the input image and the outputs of them are concatenated into a filter bank. Additionally, because of the beneficial effect of pooling in the convolutional networks, a parallel path of pooling has been added in each module. The Inception module in its naïve form (Fig. 1a) suffers from high computation and power cost. In addition, as the concatenated output from the various convolutions and the pooling layer will be an extremely deep channel of output volume, the claim that this architecture has an improved memory and computation power use looks like counterintuitive. However, this issue has been addressed by adding a 1 by 1 convolution before costly 3 by 3 and 5 by 5 convolutions. The idea of 1 by 1 convolution was first introduced by Lin et al. and called network in network [1]. This 1x1 convolution mathematically is equivalent to a multilayer perceptron which reduces the dimension of filter space (the depth of the output volume) and on top of that they also act as a non-linear rectifying activation layer ReLu to add to the non-linearity immediately after each 1 by 1 convolution (Fig. 1b). This enables less over-fitting due to smaller Kernel size (1 by 1). This distinctive dimensionality reduction feature of the 1 by 1 convolution allows shielding of the large number of input filters of the previous stage to the next stage (Footnote 2).
+<div align="center">Figure 1: Modelling process with machine learning methods</div>
-[[File:Inception module, naıve version.JPG | center]]
+We provide here an overview of Bayesian Networks, since in the setting of our course it was not covered. Bayesian Networks are graph based statistical models that represent probabilistic relationships among variables and are mathematically formulated as:
-<div align="center">Figure 1(a): Inception module, naïve version</div>
+[[File: BN Formula.png | center]]
-[[File:Inception module with dimension reductions.JPG | center]]
+They are also referred to as Directed Acyclic Graphs (DAG), i.e., graphs composed of nodes representing variables and arrows representing the direction of the dependency. BN are easily interpretable, especially for non-technical audience and are well adopted in the areas of biology and Medicine [4-5].
-<div align="center">Figure 1(b): Inception module with dimension reductions</div>
+Generally, the choice of an algorithm is informed by the nature of the dataset which also suggests the most appropriate performance evaluation criteria of the technique. For instance, the dataset in this study is characterized on one hand by 4 classes (vs. the typical 2-class) and on the other by the unbalance in the number of participants in each group. Because of this latter characteristic,  Hasan Aykut et al. included  in their evaluation criteria the Geometric Mean and the Youden' index, since these 2 are known to resist the impact of unbalanced data on the performance evaluation. Table 2 below shows the evaluation criteria formulae.
-The combination of various layers of convolution has some similarity with human eyes in interpreting the visual information in a sense that human eyes also process the visual information at various scale and combines to extract the features from different scale simultaneously. Similarly, in inception design network in network designs extract the fine grain details of input volume while medium- and large-sized filters cover a large receptive field of the inputs and extract their features and with pooling operations overfitting can be overcome by reducing the spatial sizes.
-== Dataset and Study Approach ==
+[[File: Criteria Formula.png | center]]
-The Inception architecture consists of stacking blocks called the inception modules. The idea is that to increase the depth and width of model by finding local optimal sparse structure and repeating it spatially. Traditionally, in each layer of convolutional network pooling operation and convolution and its size (1 by 1, 3 by 3 or 5 by 5) should be decided while all of them are beneficial for the modeling power of the network. Whereas, in Inception module instead of choosing, all these various options are computed simultaneously (Fig. 1a). Inspired by layer-by-layer construction of Arora et al. [3], in Inception module statistics correlation of the last layer is analyzed and clustered into groups of units with high correlation. These clusters form units of next layer and are connected to the units of previous layer. Each unit from the earlier layer corresponds to some region of the input image and the outputs of them are concatenated into a filter bank. Additionally, because of the beneficial effect of pooling in the convolutional networks, a parallel path of pooling has been added in each module. The Inception module in its naïve form (Fig. 1a) suffers from high computation and power cost. In addition, as the concatenated output from the various convolutions and the pooling layer will be an extremely deep channel of output volume, the claim that this architecture has an improved memory and computation power use looks like counterintuitive. However, this issue has been addressed by adding a 1 by 1 convolution before costly 3 by 3 and 5 by 5 convolutions. The idea of 1 by 1 convolution was first introduced by Lin et al. and called network in network [1]. This 1x1 convolution mathematically is equivalent to a multilayer perceptron which reduces the dimension of filter space (the depth of the output volume) and on top of that they also act as a non-linear rectifying activation layer ReLu to add to the non-linearity immediately after each 1 by 1 convolution (Fig. 1b). This enables less over-fitting due to smaller Kernel size (1 by 1). This distinctive dimensionality reduction feature of the 1 by 1 convolution allows shielding of the large number of input filters of the previous stage to the next stage (Footnote 2).
-[[File:Inception module, naıve version.JPG | center]]
+== Results Presentation ==
+The resulting Bayesian Network from the study is presented below and visually it shows the dependency of the class prediction, i.e., Patient Type, on all the features, some of which have dependency among themselves as well. For example, we can see that the number of patience is dependent on age.
-<div align="center">Figure 1(a): Inception module, naïve version</div>
+[[File: BN Network.png | center]]
-[[File:Inception module with dimension reductions.JPG | center]]
+<div align="center">Figure 2: Bayesian Network model of the dataset</div>
-<div align="center">Figure 1(b): Inception module with dimension reductions</div>
+Table 3 summarizes the performance comparison of all 8 techniques through the lens of the 11 evaluation criteria and Bayesian Network comes top in all criteria, including 88.2% on Geometric Mean, 88.3% on Youden's index, 88.7% on accuracy and 97% with the weighted ROC.
-The combination of various layers of convolution has some similarity with human eyes in interpreting the visual information in a sense that human eyes also process the visual information at various scale and combines to extract the features from different scale simultaneously. Similarly, in inception design network in network designs extract the fine grain details of input volume while medium- and large-sized filters cover a large receptive field of the inputs and extract their features and with pooling operations overfitting can be overcome by reducing the spatial sizes.
+[[File: Performance Comparison Table.png | center]]
-== Results Analysis ==
+== Conclusion and Critiques ==
-The proposed architecture was implemented through a deep network called GoogLeNet as a submission for ILSVRC14’s Classification Challenge and Detection Challenge.
+In this study, Hasan Aykut et al. conducted a technical study to evaluate and determine among 8 algorithms which is the best decision tool for ALS prediction. Eleven evaluation criteria were used and Bayesian Network produced the best results with among other measures, 88.7% on Accuracy and 97% on ROC that is more adequate for unbalanced datasets. In addition to achieving the best predictive performance in this study, Bayesian Networks are also easily interpretable by non-technical audience, justifying its adoption in domains such as biology and medicine.
-The classification challenge is to classify images into one of 1000 categories in the Imagenet hierarchy. The top-5 error rate -  the percentage of test examples for which the correct class is not in the top 5 predicted classes - is used for measuring accuracy. The results of the classification challenge is shown in Table 1. The final submission of GoogLeNet obtains a top-5 error of 6.67% on both the validation and testing data, ranking first among all participants, significantly outperforming top teams in previous years, and not utilizing external data.
+From our perspective, given the nature of the dataset, which not only is unbalanced but also has 4 different classes, we would suggest Convolutional Neural Network as another method to explore further accuracy improvement though interpretability in that case would be more of a challenge. Because to date its origin is still unknown and very difficult to diagnose in early stage, it is a very important area of research that should rally more members of the Machine Learning and Artificial Intelligence community into further collaboration with the medical community in order to develop a more effective and rapid diagnostic decision tool for the detection of this neurological disorder at its inception as this could contribute to determining its origin and therefore allows for the development of preventative and curative measures.
-[[File:Classiﬁcation performance.JPG | center|500px|Image: 500 pixels]]
-<div align="center">Table 1: Classiﬁcation performance</div>
-The ILSVRC detection challenge asks to produce bounding boxes around objects in images among 200 classes. Detected objects count as correct if they match the class of the groundtruth and their bounding boxes overlap by at least 50%. Each image may contain multiple objects (with different scales) or none. The mean average precision (mAP) is used to report performance. The results of the detection challenge is listed in Table 2. Using the Inception model as a region classifier, combining Selective Search and using an ensemble of 6 CNNs, GoogLeNet gave top detection results, almost doubling accuracy of the the 2013 top model.
-[[File:Detection performance.JPG | center|600px|Image: 600 pixels]]
-<div align="center">Table 2: Detection performance</div>
-== Conclusion ==
-Googlenet outperformed the other previous deep learning networks, and it became a proof of concept that approximating the expected optimal sparse structure by readily available dense building blocks (or the inception modules) is a viable method for improving the neural networks in computer vision. The significant quality gain is at a modest increase for the computational requirement is the main advantage for this method. Even without performing any bounding box operations to detect objects, this architecture gained a significant amount of quality with a modest amount of computational resources.
-== Critiques ==
-By using nearly 5 million parameters, GoogLeNet, compared to previous architectures like VGGNet and AlexNet, reduced the number of parameters in the network by almost 92%. This enabled Inception to be used for many big data applications where a huge amount of data was needed to be processed at a reasonable cost while the computational capacity was limited. However, the Inception network is still complex and susceptible to scaling. If the network is scaled up, large parts of the computational gains can be lost immediately. Also, there was no clear description about the various factors that lead to the design decision of this inception architecture, making it harder to adapt to other applications while maintaining the same computational efficiency.
--
 == References ==
 [1] Rowland, L.P.; Shneider, N.A. Amyotrophic Lateral Sclerosis. N. Engl. J. Med. 2001, 344, 1688–1700. [[https://www.nejm.org/doi/full/10.1056/NEJM200105313442207 CrossRef]] [[https://pubmed.ncbi.nlm.nih.gov/11386269/ PubMed]]
-[2] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on, 2014.
+[2] Hardiman, O.; van den Berg, L.H.; Kiernan, M.C. Clinical Diagnosis and Management of Amyotrophic Lateral Sclerosis. Nat. Rev. Neurol. 2011, 7, 639–649. [[http://doi.org/10.1038/nrneurol.2011.153 CrossRef]] [[http://www.ncbi.nlm.nih.gov/pubmed/21989247 PubMed]].
-[3] Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. CoRR, abs/1310.6343, 2013.
-[4] ¨Umit V. C¸ ataly¨urek, Cevdet Aykanat, and Bora Uc¸ar. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput., 32(2):656–683, February 2010.
+[3] Swinnen, B.; Robberecht, W. The Phenotypic Variability of Amyotrophic Lateral Sclerosis. Nat. Rev. Neurol. 2014, 10, 661. [[http://doi.org/10.1038/nrneurol.2014.184 CrossRef]] [[http://www.ncbi.nlm.nih.gov/pubmed/24126629 PubMed]]
-Footnote 1: Hebbian theory is a neuroscientific theory claiming that an increase in synaptic
+[4] Bandyopadhyay, S.; Wolfson, J.; Vock, D.M.; Vazquez-Benitez, G.; Adomavicius, G.; Elidrisi, M.; Johnson, P.E.; OConnor, P.J. Data Mining for Censored Time-to-Event Data: A Bayesian Network Model for Predicting Cardiovascular Risk from Electronic Health Record Data. Data Min. Knowl. Discov. 2015, 29, 1033–1069. [[http://doi.org/10.1007/s10618-014-0386-6 CrossRef]]
-efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic
-cell. It is an attempt to explain synaptic plasticity, the adaptation of brain neurons during the learning process.
-Footnote 2: For more explanation on 1 by 1 convolution refer to: https://iamaaditya.github.io/2016/03/one-by-one-convolution/
+[5] Kanwar, M.K.; Lohmueller, L.C.; Kormos, R.L.; Teuteberg, J.J.; Rogers, J.G.; Lindenfeld, J.; Bailey, S.H.; McIlvennan, C.K.; Benza, R.; Murali, S.; et al. A Bayesian Model to Predict Survival after Left Ventricular Assist Device Implantation. JACC Heart Fail. 2018, 6, 771–779. [[http://doi.org/10.1016/j.jchf.2018.03.016 CrossRef]] [[http://www.ncbi.nlm.nih.gov/pubmed/30098967 PubMed]]