F21-STAT 940-Proposal: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 95: Line 95:




Title: MLP-Mixer: MLP Architecture for NLP
'''Title:''' MLP-Mixer: MLP Architecture for NLP


Description: Although transformers, and other attention-based architectures are the common parts of the most state-of-the-art models in both Vision and Natural Language Processing (NLP), recently, “MLP-Mixer: An all-MLP Architecture for Vision” (https://arxiv.org/pdf/2105.01601.pdf) paper disclosed a model that is entirely based on multi-layer perceptrons (MLPs) with competitive results on both accuracy and computational cost to these advanced methods in field of vision. This paper claims that neither of attention, and convolutions are necessary which the claim is proved by its well-stablished experiments and results. Beside its astounding results, the application of this model is not yet addressed in field of NLP. So, this project approaches towards implementation of this model for NLP tasks. The overall outline and steps can be:  
'''Description:'''  Although transformers, and other attention-based architectures are the common parts of the most state-of-the-art models in both Vision and Natural Language Processing (NLP), recently, “MLP-Mixer: An all-MLP Architecture for Vision” (https://arxiv.org/pdf/2105.01601.pdf) paper disclosed a model that is entirely based on multi-layer perceptrons (MLPs) with competitive results on both accuracy and computational cost to these advanced methods in field of vision. This paper claims that neither of attention, and convolutions are necessary which the claim is proved by its well-stablished experiments and results. Beside its astounding results, the application of this model is not yet addressed in field of NLP. So, this project approaches towards implementation of this model for NLP tasks. The overall outline and steps can be:  


1. Paper and code assessment  
1. Paper and code assessment  

Revision as of 14:05, 8 October 2021

Use this format (Don’t remove Project 0)

Project # 0 Group members:

Abdelkareem, Youssef

Nasr, Islam

Huang, Xuanzhi


Title: Automatic Covid-19 Self-Test Supervision using Deep Learning

Description:

The current health regulations in Canada mandate that all travelers arriving who have to quarantine take a Covid-19 Self-test at home. But the process involves a human agent having a video call with you to walk you through the steps. The idea is to create a deep learning pipeline that takes a real-time video stream of the user and guides him through the process from start to end with no human interference. We would be the first to try to automate such a process using deep learning.

The steps of the test are as follows: 
 - Pickup the swab 
 - Place the swab in our nose up to a particular depth
 - Rotate in place for a certain period of time 
 - Return back the swab to the required area 
 - https://www.youtube.com/watch?v=jDIUFDMmBDo
Our pipeline will do the following steps: 
 - Use a real-time face detector to detect bounding box around User's face (Such as SSD-MobileNetV2) 
 - Use a real-time landmark detector to accurately detect the location of the nose. 
 - Use a novel model to classify whether the swab is (Inside Left Nose, Inside right Node, Outside Nose) 
 - Once the swab is detected to be inside the nose, the swab will have a marker (such as aruco marker) that can be automatically tracked in real-time using opencv, we will detect the location of the marker relative to the nose location and make sure it's correctly placed and rotated for the determined period. 
 - This whole pipeline will be developed as a state machine, whenever the user violates the rule of a certain step he goes back to the corresponding step and repeats again.
Challenges:
 - There is no available dataset for training our novel model (to classify whether the swab is inside the nose or not), we will need to build our own dataset using both synthetic and real data. We will build a small dataset (5000-6000 images) and will rely on transfer learning to finetune an existing face attribute (such as moustache and glasses) classification model on our small dataset.
 - The whole pipeline will need to operate at a minimum of 30 FPS on CPU in order to match the FPS of a real-time video stream.



Project # 1 Group Members:

Feng, Jared

Salm, Veronica

Jones-McCormick, Taj

Sarangian, Varnan

Title: An Arsenal of Augmentation Strategies for Natural Language Processing

Description: Data augmentation is an effective technique for improving the generalization power of modern neural architectures, especially when trained over smaller datasets. Strategies for modifying training examples in domains such as computer vision and speech processing are typically straightforward, however it becomes more challenging in text processing where such generalized approaches are non-trivial. Unlike with images, where most geometric operations typically preserve the images' significant features, incorporating augmentation in text at the syntactic level can reduce the data quality as it may produce noisy examples that are no longer human-readable.

The purpose of this project is to investigate data augmentation techniques in NLP to accommodate for training robust neural networks over smaller datasets. Ideally, this may act as an alternative to relying on finetuning computationally demanding pretrained models. We will start with a survey of existing text augmentation techniques and evaluate them on several downstream tasks (using existing benchmark datasets and possibly novel ones). We will further attempt to adapt popular augmentation approaches from external domains (i.e. computer vision) into an NLP setting. This will include an empirical investigation into determining whether its appropriate to augment at the syntactic (i.e. raw text) or semantic (i.e. encoded vectors) level.


Project # 2 Group members:

Shi, Yuliang

Liang, Wei


Title: Ensemble Learning Of Neural Network On Chest X-ray Image for COVID-19 Patients

Description:

COVID-19 is a contagious disease that emerged in Wuhan, China in December 2019 and has been spreading worldwide rapidly, presenting unprecedented challenges to the world. Infected people may have a wide range of symptoms such as fever, cough, fatigue, etc. Researchers are tirelessly working together to better understand the methodology of death caused by COVID-19 . However, whether the x-ray chest image can be detected automatically by machine is still required further investigation. This gives us the motivation to consider deep learning methodology for classifying x-ray images of COVID-19 patients. In this study, we are going to use the two datasets. The first dataset is from confirmed positive cases of COVID-19 with x-ray images and the second dataset is posted on the Kaggle website. In total, there are 1493 images with 224x224 pixels in our dataset.

Purpose:

The objective of this project is to identify different chest x-ray images from four different categories - COVID-19, Pneumonia bacterial, Pneumonia viral and normal chest. This is an image classification problem and we set up our model based on convolutional neural networks (CNN), with inputs being the chest x-ray images and outputs being the disease categories. CNN is powerful in computer visions and has achieved extensive success. A typical CNN architecture consists of input layers, convolutional layers, pooling layers and fully connected output layers. We will tailor our networks based on this classical CNN architecture and adopt some techniques for more accurate detection of COVID-19 from the x-ray images.

Challenges and Improvements:

We develop and improve our CNN based model in the following three domains:

First, we can use the pre-trained model to do transfer learning for higher precision. The last several layers will be refined in order to better train the images. Several techniques will be applied to avoid overfitting issue, including regulazations, batchnormal, and data augmentation.

Second, as there are so many networks has been proposed for COVID-19 detection using x-ray images, we can ensemble them together for lower variance and better performance.

Finally, vision Transformer (ViT) is a new but milestone technology in deep learning. Transformer was firstly came up for natural language processing. It was later applied to computer vision and achieved huge success. We are going to figure out the application of this new technology in classification of chest x-ray images.

Link To Proposal: STAT 940 Proposal


Project # 3 Group Members:

Leung, Alice

Yalsavar, Maryam

Jamialahmadi, Benyamin


Title: MLP-Mixer: MLP Architecture for NLP

Description: Although transformers, and other attention-based architectures are the common parts of the most state-of-the-art models in both Vision and Natural Language Processing (NLP), recently, “MLP-Mixer: An all-MLP Architecture for Vision” (https://arxiv.org/pdf/2105.01601.pdf) paper disclosed a model that is entirely based on multi-layer perceptrons (MLPs) with competitive results on both accuracy and computational cost to these advanced methods in field of vision. This paper claims that neither of attention, and convolutions are necessary which the claim is proved by its well-stablished experiments and results. Beside its astounding results, the application of this model is not yet addressed in field of NLP. So, this project approaches towards implementation of this model for NLP tasks. The overall outline and steps can be:

1. Paper and code assessment

2. Applying the model to a named entity recognition problem

3. Data preparation

4. Training, fine-tuning, and testing