http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=V23joshi&feedformat=atomstatwiki - User contributions [US]2022-05-20T15:00:39ZUser contributionsMediaWiki 1.28.3http://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51221F21-STAT 441/841 CM 763-Proposal2021-12-09T07:03:09Z<p>V23joshi: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-Based Convolutional Neural Network for Cancers Classification<br />
<br />
Description: In this project, we consider classifying three classes (tumor types) of cancers based on pathological data. We will follow the paper ''Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification''.<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Application of Neural Networks<br />
<br />
Description: Using neural networks to determine content/intent of emails.<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Classification of text<br />
<br />
Description: Being to automatically grade answers on tests can save a lot of time and teaching resources. But unlike a multiple-choice format where grading can be automated, the other formats involving text answers is more through in testing knowledge but still requires human evaluation and marking which is a bottleneck of teaching resources and personnel on a large scale with thousands of students. We will use classification techniques and machine learning to develop an automated way to predict the rightness of text answers with good accuracy that can be used by and suppport graders to reduce the time and manual effort needed in the grading process.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Xu, Siming<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Dave, Meet Hemang<br />
<br />
Chen, Yushan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Convolutional Neural Network (CNN) to classify real-world data (application to build an efficient case study interview materials classifier) and improve CNN algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of CNN, it allows us to further analyze the efficiency and practicality of the algorithm.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50695stat441F212021-11-23T05:46:37Z<p>V23joshi: </p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || || || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || || || ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || || || ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease || https://www.mdpi.com/2076-3425/11/2/150/pdf || ||<br />
|-<br />
|Week of Nov 22 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree Summary]||</div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree&diff=50694Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree2021-11-23T05:45:50Z<p>V23joshi: Created page with "== Presented By == Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi == Introduction == According to the Support Vector Machine, if we have a set of m samples: S = {(..."</p>
<hr />
<div>== Presented By ==<br />
Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi<br />
<br />
== Introduction ==<br />
According to the Support Vector Machine, if we have a set of m samples: S = {(x1,y1), (x2,y2), …, (xm,ym)}, where xi 𝜖 Rn for all i and yi 𝜖 {-1,1}, and hyperplane wx + b = 0 separate the set exactly with the distance from the hyperplane to the nearest sample is maximum, then this hyperplane will be called an optimal hyperplane. This is also known as the maximum margin hyperplane.<br />
<br />
[[File: hyperplane.png | center]]<br />
<br />
<br />
The SVM is much stronger than many other classification methods as we are able to classify in the higher dimensions with the use of Kernel methods. In the kernel methods, we map the sample from the original space to a higher dimension and then we construct a hyperplane that separates the samples in higher-dimensional space. <br />
<br />
Using the kernel method leads us to a minimization problem for : <br />
<br />
[[File: formula.png | center]]<br />
<br />
And finally, our final decision function is:<br />
<br />
[[File: final.png | center]]<br />
<br />
== Previous Work ==<br />
<br />
Currently, there are two main multi-classification methods for SVM including the direct method and the indirect method. The solution process of the objective function for the direct method is difficult, especially in the case of a large number of samples, which will greatly increase the difficulty of calculation and solution and increase the training time which results in bad classification accuracy. For this reason, Indirect methods are preferred in practice. <br />
<br />
There are many indirect classification methods based on SVM: one-versus-one method,<br />
one-versus-all method directed acyclic graph method, error-correcting output codes method, binary decision tree method, and hierarchical multiclass SVM algorithm. The principle of the SVM multiclass classification method based on Decision Tree (DT) is to construct a decision tree recursively so that each layer can separate one or more classes from the rest of the classes. The “error accumulation” phenomenon during the construction process is the error occurring at a certain node will spread to the next layer of nodes, making the classification error of the next level further expand. Therefore, the higher the node where the error occurs the larger the scope of the error’s influence.<br />
<br />
The key to reducing the error accumulation is separating the easily distinguishable classes first and reducing the error rate of the upper nodes as much as possible. The binary tree<br />
classification algorithm based on SVM uses the minimum distance method as the<br />
between-classes separability measure. The separation measure is defined in [1] as “the<br />
distance between the centre of two types of samples and the ratio of the variance sum of the two types of samples themselves. However this measure does not take “between classes variance” into consideration which could have a profound impact on the result.<br />
<br />
== The Design of IBDT-SVM Algorithm == <br />
<br />
The algorithm divides the samples with the greatest difference step by step according to the new between-classes separability measure to train the classifier and then classifies the rest of the classes according to the class-grouping-by-majority principle to form the new training samples and trains the classifier again. This is how we come up with the IBDT-SVM algorithm.<br />
<br />
<br />
'''1. The Improved Between-Classes Separability Measure'''<br />
<br />
To solve the existing problems in the traditional between-classes separability measure inspired by three decision making of clustering, this paper proposes between-classes separability about q neighbours, and the new between-class separability measure considers the following three factors importantly:<br />
<br />
(1) The Between-Classes Variance: Considering one sample’s q neighbours in the other neighbour class, its value indicates the degree of separation of one class object from another class.<br />
<br />
(2) Class Variance: It reflects the compactness of distribution of samples of the class itself and is inversely proportional to the between-classes separability measure; that is, the smaller the value, the greater the separation of the class from other classes.<br />
<br />
(3) Between-Class Distance: It is the distance between two class centres and is proportional to the between-classes separability measure; the greater the values is, the greater the separability of the two classes of samples is.<br />
<br />
'''2. Class Grouping Algorithm Based on the Principle of Class-Grouping by Majority'''<br />
<br />
The IBDT-SVM multiclassification algorithm proposed in this paper improves the between classes separability measure on the basis of considering the between-classes distance, class variance, and between-classes variance. According to the improved between-classes separability measure, the two classes with the highest separability measure are first found and the classification model is trained. Then, it uses the class-grouping-by majority principle to group the other classes into these two groups and use them as the training samples to retrain the classifier. This algorithm loops on each decision surface until each output is a leaf node, that is, a single sample point. This method can ensure that all classes can be separated as far as possible at each classification node and the classification of the rest classes as reasonable as possible. For the data with uneven or sparse distribution, the error caused by the classification method of the minimum distance between class centres can be avoided.<br />
<br />
'''3. The IBDT Algorithm'''<br />
<br />
The algorithm is as follows:<br />
<br />
Assume that there are M classes of input data, and class i contains ni samples, then n1+n2+…+ ni = n. The IBDT process is:<br />
<br />
Step 1. Set initial q value and calculate the separability measure values of every two classes of samples in M class which is calculated according to the value of separation.<br />
<br />
Step 2. Set two maximum between-classes separability measures and let them be class max1 and class max2<br />
<br />
Step 3. The classifier is trained by using class max 1 and class max 2 as the training samples which is denoted as the old classifier.<br />
<br />
Step 4. According to the principle of the class-grouping-by-majority, which is mentioned above, the remaining class samples are classified into class max 1 and class max 2 and form two major classes {max1, Mi1, Mi2, Mi3, . . . . ., Mij} and {max2, Mij+1, Mij+2, Mij+3, . . . ., MiM-2} and then they are marked as positive samples and negative samples, training the SVM classifier to form the classifier, denoted as our new-classifier.<br />
<br />
Step 5. Repeat the process until each sample is labelled as a single category.<br />
<br />
== The Numerical Experiments and Results ==<br />
<br />
Five multiclass datasets are chosen for 10-fold cross-validation to calculate the classification accuracy for each model. The models include OVO, OVA, BDT-SVM, VBDT-SVM algorithm, and the newly proposed IBDT-SVM algorithm. The datasets chosen have different sample sizes, categories and attributes to see how well the models perform on a wider range of data with different statistical properties. It is to be noted that the OVO algorithm performs considerably well for most datasets except for Breast Tissue where the lack of data could have affected predictions, whereas IBDT-SVM performs comparatively well for all datasets irrespective of different properties as shown in table 2 below. This implies that overall, the IBDT-SVM algorithm has greater stability and improved classification predictions in comparison with the other models. The numerical results can be summarized as shown in table 8 below where the average classification accuracy of all 10 experiments is calculated for each dataset.<br />
<br />
<div align="center">Table 1: The information of the datasets in this study</div><br />
<br />
[[File: table1.png | center]]<br />
<br />
<br />
<div align="center">Table 2: The comparison of average classification accuracy of the five multiclassification algorithms. </div><br />
<br />
[[File: table2.png | center]]<br />
<br />
<br />
== Conclusion ==<br />
<br />
In this paper, the authors provide an improvement to the original BDT-SVM method in which originally we consider the distance between the classes. However, the authors suggest using the class variance and the variance between classes. They suggest using the two classes with the maximum difference in order to train the old classifier. <br />
<br />
The number of classifiers needed to be constructed is less hence, it makes the algorithm easy and moreover the results also provide evidence on the improvement of classification accuracy as compared to the binary tree classification algorithm. <br />
<br />
The authors also discuss the future considerations to further improve the SVM algorithms such as constructing the decision tree based on SVM using the distance between each centre and the root node. <br />
<br />
== References == <br />
<br />
[1] S. Y. Xia, H. Pan, and L. Z. Jin, “Multi-class SVM method based on a non-balanced binary tree,” Computer Engineering and Applications, vol. 45, no. 17, pp. 167–169, 2009.<br />
<br />
H. Yu and C. K. Mao, “Automatic three-way decision clustering algorithm based on k means,” Computer Applications, vol. 36, no. 8, pp. 2061–2065, 2016.<br />
<br />
P. Kantavat, B. Kijsirikul, P. Songsiri, K.-I. Fukui, and M. Numao, “Efficient decision trees for multi-class support vector machines using entropy and generalization error estimation,” International Journal of Applied Mathematics and Computer Science, vol. 28, no. 4, pp. 705–717, 2018.</div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:table2.png&diff=50693File:table2.png2021-11-23T05:43:12Z<p>V23joshi: V23joshi uploaded a new version of File:table2.png</p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:table1.png&diff=50692File:table1.png2021-11-23T05:43:02Z<p>V23joshi: V23joshi uploaded a new version of File:table1.png</p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:table2.png&diff=50690File:table2.png2021-11-23T05:26:12Z<p>V23joshi: V23joshi uploaded a new version of File:table2.png</p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:table1.png&diff=50689File:table1.png2021-11-23T05:25:47Z<p>V23joshi: V23joshi uploaded a new version of File:table1.png</p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:formula.png&diff=50688File:formula.png2021-11-23T05:25:18Z<p>V23joshi: V23joshi uploaded a new version of File:formula.png</p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:final.png&diff=50687File:final.png2021-11-23T05:25:06Z<p>V23joshi: </p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:hyperplane.png&diff=50686File:hyperplane.png2021-11-23T05:24:50Z<p>V23joshi: </p>
<hr />
<div></div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50681stat441F212021-11-23T05:08:39Z<p>V23joshi: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || || || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || || || ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || || || ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease || https://www.mdpi.com/2076-3425/11/2/150/pdf || ||<br />
|-<br />
|Week of Nov 22 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || ||</div>V23joshihttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=49980F21-STAT 441/841 CM 763-Proposal2021-10-08T17:14:47Z<p>V23joshi: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: <br />
<br />
Description:<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Application of Neural Networks<br />
<br />
Description: Using neural networks to determine content/intent of emails.<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Title: Classification<br />
<br />
Description: We will be working on the alternate project that the Professor will release on Sunday<br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Zhang, Bowen<br />
<br />
Li, Shaozhong<br />
<br />
Kerr, Hannah<br />
<br />
Wong, Ann gie<br />
<br />
Title: Classification<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Xu, Siming<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Syamala, Aavinash Reddy<br />
<br />
Zhu, Jigang<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Dave, Meet Hemang<br />
<br />
Chen, Yushan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 17 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Liu, Ziyi<br />
<br />
Zheng, Hanxi<br />
<br />
Li, Yifan<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Convolutional Neural Network (CNN) to classify real-world data (application to build an efficient insurance contract classifier) and improve CNN algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of CNN, it allows us to further analyze the efficiency and practicality of the algorithm.<br />
The dataset is composed of insurance contracts containing client and policy information. We will implement a multi-class classification to break down the information contained in each insurance contract into some pre-determined subcategories (eg, short-term renewable/long-term non-renewable). We will attempt to process the complicated data into several data types(e.g. JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 18 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Title: Kaggle project: Brain Tumor Radiogenomic Classification<br />
<br />
Description: In this project, we will predict the genetic subtype of glioblastoma using MRI (magnetic resonance imaging) scans to train and test your model to detect the presence of MGMT promoter methylation.<br />
--------------------------------------------------------------------</div>V23joshi