F21-STAT 441/841 CM 763-Proposal: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(146 intermediate revisions by 71 users not shown)
Line 16: Line 16:


--------------------------------------------------------------------
--------------------------------------------------------------------
Project # 1 Group members:


'''Project # 1 Group members:'''
Feng, Jared


Song, Quinn
Huang, Xipeng


Loh, William
Xu, Mingwei


Bai, Junyue
Yu, Tingzhou


Choi, Phoebe
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks


'''Title:''' APTOS 2019 Blindness Detection
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch-wise and whole image-wise. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that support vector machine (SVM) on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification.


'''Description:'''
Our poster is [https://www.dropbox.com/s/fu6vr2cxcbt4458/Stat_841_poster.pdf?dl=0 here].
--------------------------------------------------------------------
Project # 2 Group members:


Our team chose the APTOS 2019 Blindness Detection Challenge from Kaggle. The goal of this challenge is to build a machine learning model that detects diabetic retinopathy by screening retina images.
Anderson, Eric


Millions of people suffer from diabetic retinopathy, the leading cause of blindness among working-aged adults. It is caused by damage to the blood vessels of the light-sensitive tissue at the back of the eye (retina). In rural areas where medical screening is difficult to conduct, it is challenging to detect the disease efficiently. Aravind Eye Hospital hopes to utilize machine learning techniques to gain the ability to automatically screen images for disease and provide information on how severe the condition may be.
Wang, Chengzhi


Our team plans to solve this problem by applying our knowledge in image processing and classification.
Zhong, Kai


Zhou, Yi Jing


----
Title: Clean-Label Targeted Poisons for an End-to-End Trained CNN on the MNIST Dataset


'''Project # 2 Group members:'''
Description: Applying data poisoning techniques to the MNIST Dataset


Li, Dylan
--------------------------------------------------------------------
Project # 3 Group members:


Li, Mingdao
Chopra, Kanika


Lu, Leonie
Rajcoomar, Yush


Sharman,Bharat
Bhattacharya, Vaibhav


'''Title:''' Risk prediction in life insurance industry using supervised learning algorithms
Title: Cancer Classification


'''Description:'''
Description: We will be classifying three tumour types based on pathological data.


In this project, we aim to replicate and possibly improve upon the work of Jayabalan et al. in their paper “Risk prediction in life insurance industry using supervised learning algorithms”. We will be using the Prudential Life Insurance Data Set that the authors have used and have shared with us. We will be pre-processing the data to replace missing values, using feature selection using CFS and feature reduction using PCA use this processed data to perform Classification via four algorithms – Neural Networks, Random Tree, REPTree and Multiple Linear Regression. We will compare the performance of these Algorithms using MAE and RMSE metrics and come up with visualizations that can explain the results easily even to a non-quantitative audience.
--------------------------------------------------------------------
Project # 4 Group members:


Our goal behind this project is to learn applying the algorithms that we learned in our class to an industry dataset and come up with results that we can aid better, data-driven decision making.
Li, Shao Zhong


----
Kerr, Hannah


'''Project # 3 Group members:'''
Wong, Ann Gie


Parco, Russel
Title: Predicting "Pawpularity" of Pets with Image Regression


Sun, Scholar
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.


Yao, Jacky
--------------------------------------------------------------------
Project # 5 Group members:


Zhang, Daniel
Chin, Jessie Man Wai


'''Title:''' Lyft Motion Prediction for Autonomous Vehicles
Ooi, Yi Lin


'''Description:''' Our team has decided to participate in the Lyft Motion Prediction for Autonomous Vehicles Kaggle competition. The aim of this competition is to build a model which given a set of objects on the road (pedestrians, other cars, etc), predict the future movement of these objects.
Shi, Yaqi


Autonomous vehicles (AVs) are expected to dramatically redefine the future of transportation. However, there are still significant engineering challenges to be solved before one can fully realize the benefits of self-driving cars. One such challenge is building models that reliably predict the movement of traffic agents around the AV, such as cars, cyclists, and pedestrians.
Ngew, Shwen Lyng


Our aim is to apply classification techniques learned in class to optimally predict how these objects move.
Title: The Application of Classification in Accelerated Underwriting (Insurance)
 
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results.
 
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality.  This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.
 
--------------------------------------------------------------------
Project # 6 Group members:
 
Wang, Carolyn
 
Cyrenne, Ethan
 
Nguyen, Dieu Hoa
 
Sin, Mary Jane
 
Title: Pawpularity (PetFinder Kaggle Competition)
 
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.
 
--------------------------------------------------------------------
Project # 7 Group members:
 
Bhattacharya, Vaibhav
 
Chatoor, Amanda
 
Prathap Das, Sutej
 
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]
 
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.
 
--------------------------------------------------------------------
Project # 8 Group members:
 
Yan, Xin
 
Duan, Yishu
 
Di, Xibei
 
Title: The application of classification on company bankruptcy prediction
 
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution.
 
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.
--------------------------------------------------------------------
Project # 9 Group members:
 
Loke, Chun Waan
 
Chong, Peter
 
Osmond, Clarice
 
Li, Zhilong
 
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques
 
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.
--------------------------------------------------------------------
 
Project # 10 Group members:
 
O'Farrell, Ethan
 
D'Astous, Justin
 
Hamed, Waqas
 
Vladusic, Stefan
 
Title: Pawpularity (Kaggle)
 
Description: Predicting the popularity of animal photos based on photo metadata
--------------------------------------------------------------------
Project # 11 Group members:
 
JunBin, Pan
 
Title: Learning from Normality: Two-Stage Method with Autoencoder and Boosting Trees for Unsupervised Anomaly Detection
 
Description: New algorithm for unsupervised anomaly detection
--------------------------------------------------------------------
Project # 12 Group members:
 
Kar Lok, Ng
 
Muhan (Iris), Li
 
Title: NFL Health & Safety - Helmet Assignment
 
Description: Assigning players to the helmet in a given footage of head collision in football play.
--------------------------------------------------------------------
Project # 13 Group members:
 
Livochka, Anastasiia
 
Wong, Cassandra
 
Evans, David
 
Yalsavar, Maryam
 
Title: TBD
 
Description: TBD
--------------------------------------------------------------------
Project # 14 Group Members:
 
Zeng, Mingde
 
Lin, Xiaoyu
 
Fan, Joshua
 
Rao, Chen Min
 
Title: Toxic Comment Classification, Kaggle
 
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.
--------------------------------------------------------------------
Project # 15 Group Members:
 
Huang, Yuying
 
Anugu, Ankitha
 
Chen, Yushan
 
Title: Implementation of the classification task between crop and weeds
 
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.
--------------------------------------------------------------------
Project # 16 Group Members:
 
Wang, Lingshan
 
Li, Yifan
 
Liu, Ziyi
 
Title: Implement and Improve CNN in Multi-Class Text Classification
 
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.
--------------------------------------------------------------------
Project # 17 Group members:
 
Malhi, Dilmeet
 
Joshi, Vansh
 
Syamala, Aavinash
 
Islam, Sohan
 
Title: Kaggle project: PetFinder.my - Pawpularity Contest
 
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.
--------------------------------------------------------------------
 
Project # 18 Group members:
 
Yuwei, Liu
 
Daniel, Mao
 
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]
 
Description: Detect single neuronal cells in microscopy images
 
--------------------------------------------------------------------
 
Project #19 Group members:
 
Samuel, Senko
 
Tyler, Verhaar
 
Zhang, Bowen
 
Title: NBA Game Prediction
 
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).
 
-------------------------------------------------------------------
 
Project #20 Group members:
 
Mitrache, Christian
 
Renggli, Aaron
 
Saini, Jessica
 
Mossman, Alexandra
 
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis
 
Description: TBD
 
--------------------------------------------------------------------
 
Project # 21 Group members:
 
Wang, Kun
 
Title: TBD
 
Description : TBD
 
--------------------------------------------------------------------
 
Project # 22 Group members:
 
Guray, Egemen
 
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network
 
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.
--------------------------------------------------------------------
 
Project # 23 Group members:
 
Bsodjahi
 
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity
 
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression

Latest revision as of 09:48, 22 December 2021

Use this format (Don’t remove Project 0)

Project # 0 Group members:

Last name, First name

Last name, First name

Last name, First name

Last name, First name

Title: Making a String Telephone

Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).


Project # 1 Group members:

Feng, Jared

Huang, Xipeng

Xu, Mingwei

Yu, Tingzhou

Title: Patch-based classification of lung cancers pathological images using convolutional neural networks

In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch-wise and whole image-wise. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that support vector machine (SVM) on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification.

Our poster is here.


Project # 2 Group members:

Anderson, Eric

Wang, Chengzhi

Zhong, Kai

Zhou, Yi Jing

Title: Clean-Label Targeted Poisons for an End-to-End Trained CNN on the MNIST Dataset

Description: Applying data poisoning techniques to the MNIST Dataset


Project # 3 Group members:

Chopra, Kanika

Rajcoomar, Yush

Bhattacharya, Vaibhav

Title: Cancer Classification

Description: We will be classifying three tumour types based on pathological data.


Project # 4 Group members:

Li, Shao Zhong

Kerr, Hannah

Wong, Ann Gie

Title: Predicting "Pawpularity" of Pets with Image Regression

Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.


Project # 5 Group members:

Chin, Jessie Man Wai

Ooi, Yi Lin

Shi, Yaqi

Ngew, Shwen Lyng

Title: The Application of Classification in Accelerated Underwriting (Insurance)

Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results.

This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.


Project # 6 Group members:

Wang, Carolyn

Cyrenne, Ethan

Nguyen, Dieu Hoa

Sin, Mary Jane

Title: Pawpularity (PetFinder Kaggle Competition)

Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.


Project # 7 Group members:

Bhattacharya, Vaibhav

Chatoor, Amanda

Prathap Das, Sutej

Title: PetFinder.my - Pawpularity Contest [1]

Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.


Project # 8 Group members:

Yan, Xin

Duan, Yishu

Di, Xibei

Title: The application of classification on company bankruptcy prediction

Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution.

Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.


Project # 9 Group members:

Loke, Chun Waan

Chong, Peter

Osmond, Clarice

Li, Zhilong

Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques

Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.


Project # 10 Group members:

O'Farrell, Ethan

D'Astous, Justin

Hamed, Waqas

Vladusic, Stefan

Title: Pawpularity (Kaggle)

Description: Predicting the popularity of animal photos based on photo metadata


Project # 11 Group members:

JunBin, Pan

Title: Learning from Normality: Two-Stage Method with Autoencoder and Boosting Trees for Unsupervised Anomaly Detection

Description: New algorithm for unsupervised anomaly detection


Project # 12 Group members:

Kar Lok, Ng

Muhan (Iris), Li

Title: NFL Health & Safety - Helmet Assignment

Description: Assigning players to the helmet in a given footage of head collision in football play.


Project # 13 Group members:

Livochka, Anastasiia

Wong, Cassandra

Evans, David

Yalsavar, Maryam

Title: TBD

Description: TBD


Project # 14 Group Members:

Zeng, Mingde

Lin, Xiaoyu

Fan, Joshua

Rao, Chen Min

Title: Toxic Comment Classification, Kaggle

Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.


Project # 15 Group Members:

Huang, Yuying

Anugu, Ankitha

Chen, Yushan

Title: Implementation of the classification task between crop and weeds

Description: Our work will be based on the paper Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation.


Project # 16 Group Members:

Wang, Lingshan

Li, Yifan

Liu, Ziyi

Title: Implement and Improve CNN in Multi-Class Text Classification

Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level. The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.


Project # 17 Group members:

Malhi, Dilmeet

Joshi, Vansh

Syamala, Aavinash

Islam, Sohan

Title: Kaggle project: PetFinder.my - Pawpularity Contest

Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.


Project # 18 Group members:

Yuwei, Liu

Daniel, Mao

Title: Sartorius - Cell Instance Segmentation (Kaggle) [2]

Description: Detect single neuronal cells in microscopy images


Project #19 Group members:

Samuel, Senko

Tyler, Verhaar

Zhang, Bowen

Title: NBA Game Prediction

Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).


Project #20 Group members:

Mitrache, Christian

Renggli, Aaron

Saini, Jessica

Mossman, Alexandra

Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis

Description: TBD


Project # 21 Group members:

Wang, Kun

Title: TBD

Description : TBD


Project # 22 Group members:

Guray, Egemen

Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network

Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.


Project # 23 Group members:

Bsodjahi

Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity

Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression