F21-STAT 441/841 CM 763-Proposal: Difference between revisions
No edit summary |
No edit summary |
||
(117 intermediate revisions by 55 users not shown) | |||
Line 16: | Line 16: | ||
-------------------------------------------------------------------- | -------------------------------------------------------------------- | ||
Project # 1 Group members: | |||
Feng, Jared | |||
Huang, Xipeng | |||
Xu, Mingwei | |||
Yu, Tingzhou | |||
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks | |||
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch-wise and whole image-wise. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that support vector machine (SVM) on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification. | |||
Our poster is [https://www.dropbox.com/s/fu6vr2cxcbt4458/Stat_841_poster.pdf?dl=0 here]. | |||
-------------------------------------------------------------------- | |||
Project # 2 Group members: | |||
Anderson, Eric | |||
Wang, Chengzhi | |||
Zhong, Kai | |||
Zhou, Yi Jing | |||
--- | Title: Clean-Label Targeted Poisons for an End-to-End Trained CNN on the MNIST Dataset | ||
Description: Applying data poisoning techniques to the MNIST Dataset | |||
-------------------------------------------------------------------- | |||
Project # 3 Group members: | |||
Chopra, Kanika | |||
Rajcoomar, Yush | |||
Bhattacharya, Vaibhav | |||
Title: Cancer Classification | |||
Description: We will be classifying three tumour types based on pathological data. | |||
-------------------------------------------------------------------- | |||
Project # 4 Group members: | |||
Li, Shao Zhong | |||
Kerr, Hannah | |||
Wong, Ann Gie | |||
Title: Predicting "Pawpularity" of Pets with Image Regression | |||
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster. | |||
-------------------------------------------------------------------- | |||
Project # 5 Group members: | |||
Chin, Jessie Man Wai | |||
Ooi, Yi Lin | |||
Shi, Yaqi | |||
Ngew, Shwen Lyng | |||
Title: The Application of Classification in Accelerated Underwriting (Insurance) | |||
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. | |||
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility. | |||
-------------------------------------------------------------------- | |||
Project # 6 Group members: | |||
Wang, Carolyn | |||
Cyrenne, Ethan | |||
Nguyen, Dieu Hoa | |||
Sin, Mary Jane | |||
Title: Pawpularity (PetFinder Kaggle Competition) | |||
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website. | |||
-------------------------------------------------------------------- | |||
Project # 7 Group members: | |||
Bhattacharya, Vaibhav | |||
Chatoor, Amanda | |||
Prathap Das, Sutej | |||
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview] | |||
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles. | |||
-------------------------------------------------------------------- | |||
Project # 8 Group members: | |||
Yan, Xin | |||
Duan, Yishu | |||
Di, Xibei | |||
Title: The application of classification on company bankruptcy prediction | |||
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. | |||
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt. | |||
-------------------------------------------------------------------- | |||
Project # 9 Group members: | |||
Loke, Chun Waan | |||
Chong, Peter | |||
Osmond, Clarice | |||
Li, Zhilong | |||
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques | |||
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. | |||
---- | -------------------------------------------------------------------- | ||
Project # 10 Group members: | |||
O'Farrell, Ethan | |||
D'Astous, Justin | |||
Hamed, Waqas | |||
Vladusic, Stefan | |||
Title: Pawpularity (Kaggle) | |||
Description: Predicting the popularity of animal photos based on photo metadata | |||
-------------------------------------------------------------------- | |||
Project # 11 Group members: | |||
JunBin, Pan | |||
Title: Learning from Normality: Two-Stage Method with Autoencoder and Boosting Trees for Unsupervised Anomaly Detection | |||
Description: New algorithm for unsupervised anomaly detection | |||
-------------------------------------------------------------------- | |||
Project # 12 Group members: | |||
Kar Lok, Ng | |||
Muhan (Iris), Li | |||
Title: NFL Health & Safety - Helmet Assignment | |||
Description: Assigning players to the helmet in a given footage of head collision in football play. | |||
-------------------------------------------------------------------- | |||
Project # 13 Group members: | |||
Livochka, Anastasiia | |||
Wong, Cassandra | |||
Evans, David | |||
Yalsavar, Maryam | |||
Title: TBD | |||
Description: TBD | |||
-------------------------------------------------------------------- | |||
Project # 14 Group Members: | |||
Zeng, Mingde | |||
Lin, Xiaoyu | |||
Fan, Joshua | |||
Rao, Chen Min | |||
Title: Toxic Comment Classification, Kaggle | |||
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments. | |||
-------------------------------------------------------------------- | |||
Project # 15 Group Members: | |||
Huang, Yuying | |||
Anugu, Ankitha | |||
Chen, Yushan | |||
Title: Implementation of the classification task between crop and weeds | |||
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''. | |||
-------------------------------------------------------------------- | |||
Project # 16 Group Members: | |||
Wang, Lingshan | |||
Li, Yifan | |||
Liu, Ziyi | |||
Title: Implement and Improve CNN in Multi-Class Text Classification | |||
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level. | |||
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization. | |||
-------------------------------------------------------------------- | |||
Project # 17 Group members: | |||
Malhi, Dilmeet | |||
Joshi, Vansh | |||
Syamala, Aavinash | |||
Islam, Sohan | |||
Title: Kaggle project: PetFinder.my - Pawpularity Contest | |||
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos. | |||
-------------------------------------------------------------------- | |||
Project # 18 Group members: | |||
Yuwei, Liu | |||
Daniel, Mao | |||
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation] | |||
Description: Detect single neuronal cells in microscopy images | |||
-------------------------------------------------------------------- | |||
Project #19 Group members: | |||
Samuel, Senko | |||
Tyler, Verhaar | |||
Zhang, Bowen | |||
Title: NBA Game Prediction | |||
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data). | |||
------------------------------------------------------------------- | |||
Project #20 Group members: | |||
Mitrache, Christian | |||
Renggli, Aaron | |||
Saini, Jessica | |||
Mossman, Alexandra | |||
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis | |||
Description: TBD | |||
-------------------------------------------------------------------- | |||
Project # 21 Group members: | |||
Wang, Kun | |||
Title: TBD | |||
Description : TBD | |||
-------------------------------------------------------------------- | |||
Project # 22 Group members: | |||
Guray, Egemen | |||
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network | |||
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN. | |||
-------------------------------------------------------------------- | |||
Project # 23 Group members: | |||
Bsodjahi | |||
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity | |||
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression | |||
Latest revision as of 08:48, 22 December 2021
Use this format (Don’t remove Project 0)
Project # 0 Group members:
Last name, First name
Last name, First name
Last name, First name
Last name, First name
Title: Making a String Telephone
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).
Project # 1 Group members:
Feng, Jared
Huang, Xipeng
Xu, Mingwei
Yu, Tingzhou
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch-wise and whole image-wise. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that support vector machine (SVM) on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification.
Our poster is here.
Project # 2 Group members:
Anderson, Eric
Wang, Chengzhi
Zhong, Kai
Zhou, Yi Jing
Title: Clean-Label Targeted Poisons for an End-to-End Trained CNN on the MNIST Dataset
Description: Applying data poisoning techniques to the MNIST Dataset
Project # 3 Group members:
Chopra, Kanika
Rajcoomar, Yush
Bhattacharya, Vaibhav
Title: Cancer Classification
Description: We will be classifying three tumour types based on pathological data.
Project # 4 Group members:
Li, Shao Zhong
Kerr, Hannah
Wong, Ann Gie
Title: Predicting "Pawpularity" of Pets with Image Regression
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.
Project # 5 Group members:
Chin, Jessie Man Wai
Ooi, Yi Lin
Shi, Yaqi
Ngew, Shwen Lyng
Title: The Application of Classification in Accelerated Underwriting (Insurance)
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results.
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.
Project # 6 Group members:
Wang, Carolyn
Cyrenne, Ethan
Nguyen, Dieu Hoa
Sin, Mary Jane
Title: Pawpularity (PetFinder Kaggle Competition)
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.
Project # 7 Group members:
Bhattacharya, Vaibhav
Chatoor, Amanda
Prathap Das, Sutej
Title: PetFinder.my - Pawpularity Contest [1]
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.
Project # 8 Group members:
Yan, Xin
Duan, Yishu
Di, Xibei
Title: The application of classification on company bankruptcy prediction
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution.
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.
Project # 9 Group members:
Loke, Chun Waan
Chong, Peter
Osmond, Clarice
Li, Zhilong
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.
Project # 10 Group members:
O'Farrell, Ethan
D'Astous, Justin
Hamed, Waqas
Vladusic, Stefan
Title: Pawpularity (Kaggle)
Description: Predicting the popularity of animal photos based on photo metadata
Project # 11 Group members:
JunBin, Pan
Title: Learning from Normality: Two-Stage Method with Autoencoder and Boosting Trees for Unsupervised Anomaly Detection
Description: New algorithm for unsupervised anomaly detection
Project # 12 Group members:
Kar Lok, Ng
Muhan (Iris), Li
Title: NFL Health & Safety - Helmet Assignment
Description: Assigning players to the helmet in a given footage of head collision in football play.
Project # 13 Group members:
Livochka, Anastasiia
Wong, Cassandra
Evans, David
Yalsavar, Maryam
Title: TBD
Description: TBD
Project # 14 Group Members:
Zeng, Mingde
Lin, Xiaoyu
Fan, Joshua
Rao, Chen Min
Title: Toxic Comment Classification, Kaggle
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.
Project # 15 Group Members:
Huang, Yuying
Anugu, Ankitha
Chen, Yushan
Title: Implementation of the classification task between crop and weeds
Description: Our work will be based on the paper Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation.
Project # 16 Group Members:
Wang, Lingshan
Li, Yifan
Liu, Ziyi
Title: Implement and Improve CNN in Multi-Class Text Classification
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level. The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.
Project # 17 Group members:
Malhi, Dilmeet
Joshi, Vansh
Syamala, Aavinash
Islam, Sohan
Title: Kaggle project: PetFinder.my - Pawpularity Contest
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.
Project # 18 Group members:
Yuwei, Liu
Daniel, Mao
Title: Sartorius - Cell Instance Segmentation (Kaggle) [2]
Description: Detect single neuronal cells in microscopy images
Project #19 Group members:
Samuel, Senko
Tyler, Verhaar
Zhang, Bowen
Title: NBA Game Prediction
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).
Project #20 Group members:
Mitrache, Christian
Renggli, Aaron
Saini, Jessica
Mossman, Alexandra
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis
Description: TBD
Project # 21 Group members:
Wang, Kun
Title: TBD
Description : TBD
Project # 22 Group members:
Guray, Egemen
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.
Project # 23 Group members:
Bsodjahi
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression