F21-STAT 940-Proposal: Difference between revisions

From statwiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
(46 intermediate revisions by 17 users not shown)
Line 3: Line 3:
Project # 0 Group members:
Project # 0 Group members:


Last name, First name
Abdelkareem, Youssef


Last name, First name
Nasr, Islam


Last name, First name
Huang, Xuanzhi


Last name, First name


Title: Making a String Telephone
Title: Automatic Covid-19 Self-Test Supervision using Deep Learning


Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).
Description:  


The current health regulations in Canada mandate that all travelers arriving who have to quarantine take a Covid-19 Self-test at home. But the process involves a human agent having a video call with you to walk you through the steps. The idea is to create a deep learning pipeline that takes a real-time video stream of the user and guides him through the process from start to end with no human interference. We would be the first to try to automate such a process using deep learning.


'''The steps of the test are as follows:'''
  - Pickup the swab
  - Place the swab in our nose up to a particular depth
  - Rotate in place for a certain period of time
  - Return back the swab to the required area
  - https://www.youtube.com/watch?v=jDIUFDMmBDo


'''Our pipeline will do the following steps:'''
  - Use a real-time face detector to detect bounding box around User's face (Such as SSD-MobileNetV2)
  - Use a real-time landmark detector to accurately detect the location of the nose.
  - Use a novel model to classify whether the swab is (Inside Left Nose, Inside right Node, Outside Nose)
  - Once the swab is detected to be inside the nose, the swab will have a marker (such as aruco marker) that can be automatically tracked in real-time using opencv, we will detect the location of the marker relative to the nose location and make sure it's correctly placed and rotated for the determined period.
  - This whole pipeline will be developed as a state machine, whenever the user violates the rule of a certain step he goes back to the corresponding step and repeats again.


Project # 1 Group members:
'''Challenges:'''
  - There is no available dataset for training our novel model (to classify whether the swab is inside the nose or not), we will need to build our own dataset using both synthetic and real data. We will build a small dataset (5000-6000 images) and will rely on transfer learning to finetune an existing face attribute (such as moustache and glasses) classification model on our small dataset.
  - The whole pipeline will need to operate at a minimum of 30 FPS on CPU in order to match the FPS of a real-time video stream.


McWhannel, Pierre


Yan, Nicole


Hussein Salamah, Ahmed
--------------------------------------------------------------------


Title: Dense Retrieval for Conversational Information Seeking
Project # 1 Group Members:


Description:
Feng, Jared
One of the recognized problems in Information Retrieval (IR) is the conversational search that attracts much attention in form of Conversational Assistants such as Alexa, Siri and Cortana. The users’ needs are the ultimate goal of conversational search systems, in this context the questions are asked sequentially imposing a multi-turn format as the Conversational Information Seeking (CIS) task. TREC Conversational Assistance Track (CAsT) [3] is a multi-turn conversational search task as it contains a large-scale reusable test collection for sequences of conversational queries. The response of this conversational model is not a list of relevant documents, but it is limited to brief response passages with a length of 1 to 3 sentences in length.


[[File:Screen Shot 2020-10-09 at 1.33.00 PM.png | 300px | Example Queries in CAsT]]
Salm, Veronica


In [4], the authors focus on improving open domain question answering by including dense representations for retrieval instead of the traditional methods. They have adopted a simple dual-encoder framework to construct a learnable retriever on large collections. We want to adopt this dense representation for the conversational model in the CAsT task and compare it with the performance of the other approaches in literature. The performance will be indicated by using graded relevance on five point, which are Fails to meet, Slightly meets, Moderately meets, Highly meets, and Fully meets.
Jones-McCormick, Taj


We aim to further improve our system performance by integrating the following techniques:
Sarangian, Varnan


• Paragraph-level pre-training tasks: ICT, BFS, and WLP [1]
'''Title:''' An Arsenal of Augmentation Strategies for Natural Language Processing


• ANCE training: periodically using checkpoints to encode documents, from which the strong negatives close to the relevant document would be used as next training negatives [5]
'''Description:''' Data augmentation is an effective technique for improving the generalization power of modern neural architectures, especially when trained over smaller datasets. Strategies for modifying training examples in domains such as computer vision and speech processing are typically straightforward, however it becomes more challenging in text processing where such generalized approaches are non-trivial. Unlike with images, where most geometric operations typically preserve the images' significant features, incorporating augmentation in text at the syntactic level can reduce the data quality as it may produce noisy examples that are no longer human-readable.
 
In summary, this project is exploratory in nature as we will be trying to use state-of-art Dense Passage Retrieval techniques (based on BERT) [4, 6], in a question answering (QA) problem. Current first-stage-retrieval approaches mainly rely on bag-of-words models. In this project, we hope to explore the feasibility of using state-of-art methods such as BERT. We will first compare how these perform on the TREC CAsT datasets [3] against the results retrieved using BM25. After these first points of comparison we will next explore methods of improving DPR by exploring one or more techniques that are made to improve the performance of DPR. [1, 5].
 
References
 
[1] Wei-Cheng Chang et al. Pre-training Tasks for Embedding-based Large-scale Retrieval. 2020. arXiv: 2002.03932 [cs.LG].
 
[2] Zhuyun Dai and Jamie Callan. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. 2019. arXiv: 1910.10687 [cs.IR].
 
[3] Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. TREC CAsT 2019: The Conversational Assistance Track Overview. 2020. arXiv: 2003.13624 [cs.IR].
 
[4] Vladimir Karpukhin et al. Dense Passage Retrieval for Open-Domain Ques- tion Answering. 2020. arXiv: 2004.04906 [cs.CL].
 
[5] Lee Xiong et al. Approximate Nearest Neighbor Negative Contrastive Learn- ing for Dense Text Retrieval. 2020. arXiv: 2007.00808 [cs.IR].
 
[6] Jingtao Zhan et al. RepBERT: Contextualized Text Embeddings for First- Stage Retrieval. 2020. arXiv: 2006.15498 [cs.IR].


The purpose of this project is to investigate data augmentation techniques in NLP to accommodate for training robust neural networks over smaller datasets. Ideally, this may act as an alternative to relying on finetuning computationally demanding pretrained models. We will start with a survey of existing text augmentation techniques and evaluate them on several downstream tasks (using existing benchmark datasets and possibly novel ones). We will further attempt to adapt popular augmentation approaches from external domains (i.e. computer vision) into an NLP setting. This will include an empirical investigation into determining whether its appropriate to augment at the syntactic (i.e. raw text) or semantic (i.e. encoded vectors) level.


--------------------------------------------------------------------


Project # 2 Group members:
Project # 2 Group members:


Singh, Gursimran
Shi, Yuliang
 
Sharma, Govind
 
Chanana, Abhinav
 
Title: Quick Text Description using Headline Generation and Text To Image Conversion
 
Description: An automatic tool to generate short description based on long textual data is a useful mechanism to share quick information. Most of the current approaches involve summarizing the text using varied deep learning approaches from Transformers to different RNNs. For this project, instead of building a standard text summarizer, we aim to provide two separate utilities for generating a quick description of the text. First, we plan to develop a model that produces a headline for the long textual data, and second, we are intending to generate an image describing the text.
 
Headline Generation - Headline generation is a specific case of text summarization where the output is generally a combination of few words that gives an overall outcome from the text. In most cases, text summarization is an unsupervised learning problem. But, for the headline generation, we have the original headlines available in our training dataset that makes it a supervised learning task. We plan to experiment with different Recurrent Neural Networks like LSTMs and GRUs with varied architectures. For model evaluation, we are considering BERTScore using which we can compare the reference headline with the automatically generated headline from the model. We also aim to explore Attention and Transformer Networks for the text (headline) generation. We will make use of the currently available techniques mentioned in the various research papers but also try to develop our own architecture if the previous methods don't reveal reliable results on our dataset. Therefore, this task would primarily fit under the category of application of deep learning to a particular domain, but could also include some components of new algorithm design.
 
Text to Image Conversion - Generation or synthesis of images from a short text description is another very interesting application domain in deep learning. One approach for image generation is based on mapping image pixels to specific features as described by the discriminative feature representation of the text. Recurrent Neural Networks have been successfully used in learning such feature representations of text. This approach is difficult to generalize because the recognition of discriminative features for texts in different domains is not an easy task and it requires domain expertise. Different generative methods have been used including Variational Recurrent Auto-Encoders and its extension in Deep Recurrent Attention Writer (DRAW). We plan to experiment with Generative Adversarial Networks (GAN). Application of GANs on domain-specific datasets has been done but we aim to apply different variants of GANs on the Microsoft COCO dataset which has been used in other architectures. The analysis will be focusing on how well GANs are able to generalize when compared to other alternatives on the given dataset.
 
Scope - The above models will be trained independently on different datasets. Therefore, for a particular text, only one of the two functionalities will be available.
 
 
 
Project # 3 Group members:
 
Sikri, Gaurav
 
Bhatia, Jaskirat
 
Title: Not decided yet (Placeholder)
 
Description: Not decided yet :)
 
 
Project # 4 Group members:
 
Maleki, Danial
 
Rasoolijaberi, Maral
 
Title: Binary Deep Neural Network for the domain of Pathology
 
Description: The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices. However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. We want to investigate the possibility of using these types of networks in the domain of histopathology as it has gigapixels images which make the use of them very useful.
 
 
Project # 5 Group members:
 
Jain, Abhinav
 
Bathla, Gautam
 
Title: lyft-motion-prediction-autonomous-vehicles(Kaggle)(Tentative)
 
Description: Autonomous vehicles (AVs) are expected to dramatically redefine the future of transportation. However, there are still significant engineering challenges to be solved before one can fully realize the benefits of self-driving cars. One such challenge is building models that reliably predict the movement of traffic agents around the AV, such as cars, cyclists, and pedestrians.
 
Comments: We are more inclined towards a 3-D object detection project. We are in the process of finding the right problem statement for it and if we are not successful, we will continue with the above Kaggle competition.
 
 
Project # 6 Group members:
 
You, Bowen
 
Avilez, Jose
 
Mahmoud, Mohammad
 
Wu, Mohan
 
Title: Deep Learning Models in Volatility Forecasting
 
Description: Price forecasting has become a very hot topic in the financial industry in recent years. We are however very interested in the volatility of such financial instruments. We propose a new deep learning architecture or model to predict volatility and apply our model to real life datasets of various financial products. We will analyze our results and compare them to more traditional methods.
 
 
Project # 7 Group members:
 
Chen, Meixi
 
Shen, Wenyu
 
Title: Through the Lens of Probability Theory: A Comparison Study of Bayesian Deep Learning Methods
 
Description: Deep neural networks have been known as black box models, but they can be made less mysterious when adopting a Bayesian approach. From a Bayesian perspective, one is able to assign uncertainty on the weights instead of having single point estimates, which allows for a better interpretability of deep learning models. However, Bayesian deep learning methods are often intractable due an increase amount of parameters and often times don't have as good performance. In this project, we will study different BDL methods such as Bayesian CNN using variational inference and Laplace approximation, with applications on image classification, and we will try to propose improvements where possible.
 
 
Project # 8 Group members:
 
Avilez, Jose
 
Title: A functional universal approximation theorem
 
Description: In the seminal paper "Approximation by superpositions of a sigmoidal function", Cybenko gave a simple proof using elementary functional analysis that a certain class of functions, called discriminatory functions, serve as valid activation functions for universal neural approximators. The objective of our project is three-fold:
 
1) Prove a converse of Cybenko's Universal Approximation Theorem by means of the Stone-Weierstrass theorem
 
2) Provide examples and non-examples of Cybenko's discriminatory functions
 
3) Construct a neural network for functional data (i.e. data arising in function spaces) and prove a universal approximation theorem for Lp spaces.
 
References:
 
[1] Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), 303-314.
 
[2] Folland, Gerald B. Real analysis: modern techniques and their applications. Vol. 40. John Wiley & Sons, 1999.
 
[3] Ramsay, J. O. (2004). Functional data analysis. Encyclopedia of Statistical Sciences, 4.
 
 
 
Project # 9 Group members:
 
Sikaroudi, Milad
 
Ashrafi Fashi, Parsa
 
Title: Domain Generalization with Model-Agnostic Semantic Features in Histopathology Images
 
Description: The performance of conventional deep neural networks tends to degrade in the presence of a domain shift, such as gathering of data from different centers. In this study for the first time we are going to introduce different anatomical sites as a domain shift to see if we can generalize a low-shot anatomical site by means of rich in terms of quantity but from different anatomical site. The hypothesis is that the statistics of retrieval for model trained using episodic domain generalization will not degrade as much as the baseline when there is a domain shift. We also hypothesize that the episodic domain generalization would perform even better than the pure Meta-learning in the presence of domain shift.


Instead of supervised learning we are going to work in weakly-supervised learning way in which the whole-slide diagnosis labels are only used. 
Liang, Wei
The questions we are going to address are:


1. How is the performance of a neural network impacted by introducing domain shift (anatomical sites)?


2. How domain generalization would help for improving generalization performance in the presence of domain shift, while we are in lack of data for a given anatomical site as our target domain: a pure meta-learning approach, episodic domain generalization or training a classifier on pre-trained features?
Title: Ensemble Learning Of Neural Network On Chest X-ray Image for COVID-19 Patients


'''Description:'''


Project # 10 Group members:
COVID-19 is a contagious disease that emerged in Wuhan, China in December 2019 and has been spreading worldwide rapidly, presenting unprecedented challenges to the world. Infected people may have a wide range of symptoms such as fever, cough, fatigue, etc. Researchers are tirelessly working together to better understand the methodology of death caused by COVID-19 . However,  whether the x-ray chest image can be detected automatically by machine is still required further investigation. This gives us the motivation to consider deep learning methodology for classifying x-ray images of COVID-19 patients. In this study, we are going to use the two datasets. The first dataset is from  confirmed positive cases of COVID-19 with x-ray images and the second dataset is posted on the Kaggle website. In total, there are 1493 images with 224x224 pixels in our dataset. 


Torabian, Parsa
'''Purpose:'''


Ebrahimi Farsangi, Sina
The objective of this project is to identify different chest x-ray images from four different categories - COVID-19, Pneumonia bacterial, Pneumonia viral and normal chest. This is an image classification problem and we set up our model based on convolutional neural networks (CNN), with inputs being the chest x-ray images and outputs being the disease categories. CNN is powerful in computer visions and has achieved extensive success. A typical CNN architecture consists of input layers, convolutional layers, pooling layers and fully connected output layers. We will tailor our networks based on this classical CNN architecture and adopt some techniques for more accurate detection of COVID-19 from the x-ray images.


Moayyedi, Arash
'''Challenges and Improvements:'''


Title: Meta-Learning Regularizers for Few-Shot Classification Models
We develop and improve our CNN based model in the following three domains:


Our project aims at exploring the effects of self-supervised pre-training on few-shot classification. We draw inspiration from the paper “When Does Self-supervision Improve Few-shot Learning?”[1] where the authors analyse the effects of using the Jigsaw puzzle[2] and rotation tasks as regularizers for training Prototypical Networks[3] and Model-Agnostic Meta-Learning (MAML)[4] networks.  
First, we can use the pre-trained model to do transfer learning for higher precision. The last several layers will be refined in order to better train the images. Several techniques will be applied to avoid overfitting issue, including regulazations, batchnormal, and data augmentation.  


The introduced paper analyzes the effects of regularizing meta-learning models using self-supervised loss, based on rotation and Jigsaw tasks. It is conventionally thought that one of the reasons MAML and other optimization based meta-learning algorithms work well is due to initializing a network into a task-generalizable state[5]. In this project, we will be looking at the effects of self-supervised pre-training, as presumably it will initialize the network into a better state than random, and potentially improve subsequent meta-learning. We will compare the effects of using self-supervised methods as pre-training, as regularization, and the combination of both.  The effects of other self-supervised learning tasks, such as discoloration and flipping, will be studied as well. We will also look at which combination of tasks, whether interlaced or applied sequentially, work better and complement one another. We will evaluate our final results on the Omniglot and Mini-Imagenet datasets. These improvements will later be compared with their application on other few-shot learning methods, including first-order MAML and Matching Networks.
Second, as there are so many networks has been proposed for COVID-19 detection using x-ray images, we can ensemble them together for lower variance and better performance.


References:
Finally, vision Transformer (ViT) is a new but milestone technology in deep learning. Transformer was firstly came up for natural language processing. It was later applied to computer vision and achieved huge success. We are going to figure out the application of this new technology in classification of chest x-ray images.


[1] https://arxiv.org/abs/1910.03560
Link To Proposal: [https://www.overleaf.com/read/tzqtxzykhggc STAT 940 Proposal]
--------------------------------------------------------------------


[2] https://arxiv.org/abs/1603.09246
Project # 3 Group Members:


[3] https://arxiv.org/abs/1703.05175
Leung, Alice


[4] https://arxiv.org/abs/1703.03400
Yalsavar, Maryam


[5] https://arxiv.org/abs/2003.11539
Jamialahmadi, Benyamin




Project # 11 Group Members:
'''Title:''' MLP-Mixer: MLP Architecture for NLP


Shikhar Sakhuja: s2sakhuj@uwaterloo.ca
'''Description:'''  Although transformers, and other attention-based architectures are the common parts of the most state-of-the-art models in both Vision and Natural Language Processing (NLP), recently, “MLP-Mixer: An all-MLP Architecture for Vision” (https://arxiv.org/pdf/2105.01601.pdf) paper disclosed a model that is entirely based on multi-layer perceptrons (MLPs) with competitive results on both accuracy and computational cost to these advanced methods in field of vision. This paper claims that neither of attention, and convolutions are necessary which the claim is proved by its well-stablished experiments and results. Beside its astounding results, the application of this model is not yet addressed in field of NLP. So, this project approaches towards implementation of this model for NLP tasks. The overall outline and steps can be:


Introduction:
1. Paper and code assessment


Controller Area Network (CAN bus) is a vehicle bus standard that allows Electronic Control Units (ECU) within an automobile to communicate with each other without the need for a host computer. Modern automobiles might have up to 70 ECUs for various subsystems such as Engine, Transmission, Breaking, etc. The ECUs exchange messages on the CAN bus and allow for a lot of modern vehicle capabilities such as automatic start/stop, electric park brakes, lane detection, collision avoidance, and more. Each message exchanged on the bus is encoded as a 29-bit packet. These 29 bits consist of a combination of Parameter Group Number (PGN), message priority, and the source address of the message. Parameter groups can be, for example, engine temperature which could include coolant temperature, fuel temperature, etc. The PGN itself includes information such as priority, reserved status, data page, and PDU format. Lastly, the source address maps the message to the ECU it originates from.
2. Applying the model to a named entity recognition problem


Goals:
3. Data preparation 


(1) This project aims to use messages exchanged on the CAN bus of a Challenger Truck collected by the Embedded Systems Group at the University of Waterloo. The data exists in a temporal format with a new message exchanged periodically. The goals of this project are two folds:
4. Training, fine-tuning, and testing


(2) Predicting the PGN and source address of message N exchanged on the bus, given messages 1 to N-1. We might also explore predicting attributes within the PGN.
'''Challenges:'''
Predicting the delay between messages N-1 and N, given the delay between each pair of consecutive messages leading up to message N-1.  
Challenges will be finding a good data set to apply NER to, and figuring out how to adapt this architecture from a computer vision problem to a NLP problem.


Potential Approach:


For the first goal, we intend to experiment with RNN models along with Attention modules since they have shown promising results in text generation/prediction.
--------------------------------------------------------------------


The second goal is more of an investigative problem where we intend to use regression techniques powered by Neural Networks to predict delays between messages N-1 and N.
Project # 4 Group Members:


Mina Kebriaee




Title: '''Short-term Forecast for Solar Photovoltaic Farm Output'''
'''Description:'''


<pre>
Synopsis: Electric power generation at a solar photovoltaic (PV) farm is predicted for near future based on historical data and weather features. A long-short-term-memory (LSTM) network is used to perform time-series prediction.


Project # 12 Group members:
Solar energy is becoming increasingly popular as a viable renewable energy source. It offers many environmental advantages; however, fluctuations with changing weather patterns have always been a problem. While solar may not entirely replace fossil fuels, it has its place in the power generation portfolio. Many electricity companies are looking to add solar energy into their mix of power generation and in order to do so they require accurate solar production forecasts. Errors in forecasting could lead to large expenses like excess fuel consumption, emergency purchases of electricity from competitors, and/or might force utility operators to perform load shedding which directly affects electricity end-users. The major challenge in solar energy generation is the intermittency of photovoltaic system power generation mainly due to weather conditions which are extremely nonlinear and thus, analytical model-based approaches lack the required flexibility and complexity to capture the underlying patterns and behavior.
Applied machine learning (ML) techniques can help improve the prediction accuracy and effectively increase social benefits by using multi-dimensional feature-based networks that digest tens of relevant input data streams including historical data and local weather conditions. In other words, ML-based tools are inevitably becoming indispensable for reducing the effect of uncertainty and energy costs in modern power systems and smart grids. The goal of this project is to implement machine learning algorithm to predict the electric energy output at a solar PV farm based on local weather and temporal parameters.
</pre>
<pre>
This project consists of three main parts:


Hemati, Sobhan
1.Data: Pre-processing of the raw data files (input) and historical power generation data files (output) from the solar farm to get meaningful numeric values on an hourly basis.


Meaney, Cameron
2.Machine Learning: Designing and implementing Deep Neural Network (Long Short-Term Memory (LSTM) model to accurately forecast short-term photovoltaic solar power). This approach exploits the desirable properties of LSTM, which is a powerful tool for modeling dependency in data.


Title: Representation learning of gigapixel histopathology images using PointNet a permutation invariant neural network
3.Application/reporting: Apply the model and report back the findings including the accuracy of the results using the available dataset.
</pre>


Description:


In recent years, there has been a significant growth in the amount of information available in digital pathology archives. This data is valuable because of its potential uses in research, education, and pathologic diagnosis. As a result, representation learning of histopathology whole slide images (WSIs) has attracted significant attention and become an active area of research. Unfortunately, scientific progress with these data have been difficult because of challenges inherent to the data itself. These challenges include highly complex textures of different tissue types, color variations caused by different stainings, and most notably, the size of the images which are often larger than 50,000x50,000 pixels. Additionally, these images are multi-resolution meaning that each WSI may contain images from different zoom levels, primarily 5X, 10X, 20X, and 40X. With the advent of deep learning, there is optimism that these challenges can be overcome. The main challenge in this approach is that the sheer size of the images makes it infeasible (or impossible) to obtain a vector representation for a WSI, which is a necessary step in order to leverage deep learning algorithms. In practice, this is often bypassed by considering ‘patches’ of the WSI of smaller sizes, a set of which is meant to represent the full WSI. This approach lead to a set representation for a WSI. However, unlike traditional image or sequence models, deep networks that process and learn permutation invariant representations from sets is still a developing area of research. Recent attempts at this include Multi-instance Learning Schemes, Deep Set, and Set Transformers. A particularly successful attempt in developing a deep neural network for set representation in called PointNet which was developed for classification and segmentation of 3D objects and point clouds. In PointNet, each set is represented using a set of (x,y,z) coordinates, and the network is designed to learn a permutation invariant global representation for each set and then use this representation for classification or segmentation.
--------------------------------------------------------------------


In this project, we attempt to first extend the PointNet network to a convolutional PointNet network such that it uses a set of image patches rather than (x,y,z) coordinates to learn the universal permutation invariant representation. Then, we attempt improve the representational power of PointNet as a permutation invariant neural network. For the first part, the main challenge is that while PointNet has been designed for processing of sets with the same size, in WSIs, the size of the image and therefore number of patches is not fixed. For this reason, we will need to develop an idea which enables CNN-PointNet to process sets with different sizes. One possible solution is to use fake members to standardize the set size and then remove the effect of these fake members in backpropagation using a masking scheme. For the second part, the PointNet network can be improved in many ways. For example, the rotation matrix used is not a real rotation matrix as the orthogonality is incorporated using a regularization term. However, using a projected gradient technique and the existence of a closed form solution for obtaining nearest orthogonal matrix to a given matrix (Orthogonal Procrustes Problem) we can keep the exact orthogonality constraint and obtain a real rotation matrix. This exact orthogonality is geometrically important as, otherwise, this transformation will likely corrupt the neighborhood structure of the points in each set. Furthermore, PointNet uses very simple symmetric function (max pooling) as a set approximator, however there more powerful symmetric functions like statistical moments, power-sum with a trainable parameter, and other set approximators can be used. It would be interesting to see how more complicated symmetric functions can improve the representational power of PointNet to achieve more discriminative permutation invariant representations for each set (in this case WSIs).
Project # 5 Group Members:


Project # 13 Group Members:


Syed Saad Naseem ssnaseem@uwaterloo.ca
Shervin Hakimi
Mehrshad Sadria


Title: Text classification of topics related to COVID-19 on social media using deep learning
The COVID-19 pandemic has become a public health emergency and a critical socioeconomic issue worldwide. It is changing the way we live and do business. Social media is a rich source of data about public opinion on different types of topics including topics about COVID-19. I plan on using Reddit to get a dataset of posts and comments from users related to COVID-19 and since Reddit is divided into communities so the posts and comments are also clustered by the topic of the community, for example, posts from the political subreddit will have posts about politics.


I plan to make a classifier that will take a given text and will tell what the text of talking about for example it can be talking about politics, studies, relationships, etc. The goals of this project are to:


• Scrape a dataset from Reddit from different communities
Title: '''Kidney Function Analysis Using Deep Neural Networks'''


• Train a deep learning model (CNN or RNN model) to classify a given text into the possible categories
'''Description:'''
<pre>
The kidney is one of the most intriguing organs of our body: a single kidney alone has enough resources for the body to function, it filters the blood and regulates fluids in the body, and is one of the most essential organs, but therein lies the problem: when this organ becomes ill there are multiple issues coming along with it such as Chronic Kidney Diseases, Renal Cell Carcinoma, etc. In these cases, the kidney progressively loses its function and would eventually lead to the person's death. There are several factors that can be used to assess the well-being of a kidney for instance number of glomeruli, cell size, urine secretion, etc. In this project, we aim to use clinical data focusing on kidneys to understand these diseases more thoroughly and in the end build a model that could assess the aforementioned clinical features.


• Test the model on posts from social talking about COVID-19
</pre>


'''Challenges:'''


<pre>


Project # 13 Group members
In order to answer the question we expect to be faced with few challenges as follow:


Edwards, John
Clinical data annotation is a time-consuming and tedious task that requires the knowledge of the expert in the field which even might end up being accurate because of human error.


Title: Modified LeicaGAN On COCO Image Data Set
Cleaning biological data can be hard since your data might have a batch effect, missing values, alignment problems, and low image resolution.


In [1] the authors present a novel text-to-image method called LeicaGAN. This model is trained and evaluated using the CUB bird [2] and Oxford-102 flower [3] data sets and reported favourable performance when compared to benchmark models.
Biological processes are complex which might require different types of datasets in different levels such as Genomics, Epigenomics, Proteomics and etc, in order to have a good understanding of the system.


I envision two possible deliverables for my project:
</pre>


First,to re-create the LeicaGan model described in [1] and train it using the Common Objects in Context (COCO) data set [4]. The purpose behind this is to evaluate how LeicaGan will preform in a more diverse domain of images. LeicaGan's source code utilises pyTorch and is publicly available at <https://github.com/qiaott/LeicaGAN>. I would attempt to recreate it using TensorFlow.
--------------------------------------------------------------------


Second, to make modifications to the models architecture in an attempt to improve its performance.One possible modification would be to change their aggregation method for merging the text and visual embedding. Specifically within the discussions section of their paper [1] they suggest the continued exploration of efficient and diverse modules for this process.Additionally their embedding networks are trained separate from the other model components. The authors believe they could alternatively  be trained end-to-end to improve performance.
Project # 6 Group Members:


I foresee the first deliverable of rebuilding  their network in TensorFlow taking a large majority of our available time.If after attempting this proves to be unmanageable then I will opt to build off their existing public code base in their PyTorch implementation and focus on implementing a wider breadth of modifications to their network architecture.Where I will compare the modified models performances against the original model.
Edward Bangala


References:
Maruf Sazed


[1] Qiao, Ting-ting et al. “Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.” NeurIPS (2019).
Title: '''Maximizing Classification Accuracy for a Fixed Model via Generative Adversarial Networks'''


[2] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. In California Institute of Technology, 2011.


[3]M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In
'''Description:'''
Computer Vision, Graphics & Image Processing(ICVGIP), 2008.


[4] Lin, Tsung-Yi et al. “Microsoft COCO: Common Objects in Context.” ArXiv abs/1405.0312 (2014)
In machine learning, models are trained on the training set and inference is done on the test set. To avoid overfitting, a validation set is used. In this approach, several models are used (with different architectures or through hyperparameter tuning), to identify the best possible model based on the validation set error. That is, the data set is kept fixed, and the models are changed. Even though different data augmentation techniques may be applied on the training set, a model selection process is still needed. In this project, we want to explore whether we can fix a model and find out the subset of the training data set that could possibly result in a greater accuracy. We will use a Generative Adversarial Network (GAN) to reconstruct images from the test data set. If the model is able to learn the distribution of the test set images, then the discriminator might be able to identify the images from the training set that are similar to the test set. The idea is borrowed from Andrew Ng, who recently launched a competition where he asked the participants to identify the best data set given a fixed model. The idea of using GANs to solve this problem is our own. However, it is well know that GANs are difficult to train and there is not certainty that subset of image could be better than using the full training images.

Revision as of 22:38, 8 October 2021

Use this format (Don’t remove Project 0)

Project # 0 Group members:

Abdelkareem, Youssef

Nasr, Islam

Huang, Xuanzhi


Title: Automatic Covid-19 Self-Test Supervision using Deep Learning

Description:

The current health regulations in Canada mandate that all travelers arriving who have to quarantine take a Covid-19 Self-test at home. But the process involves a human agent having a video call with you to walk you through the steps. The idea is to create a deep learning pipeline that takes a real-time video stream of the user and guides him through the process from start to end with no human interference. We would be the first to try to automate such a process using deep learning.

The steps of the test are as follows: 
 - Pickup the swab 
 - Place the swab in our nose up to a particular depth
 - Rotate in place for a certain period of time 
 - Return back the swab to the required area 
 - https://www.youtube.com/watch?v=jDIUFDMmBDo
Our pipeline will do the following steps: 
 - Use a real-time face detector to detect bounding box around User's face (Such as SSD-MobileNetV2) 
 - Use a real-time landmark detector to accurately detect the location of the nose. 
 - Use a novel model to classify whether the swab is (Inside Left Nose, Inside right Node, Outside Nose) 
 - Once the swab is detected to be inside the nose, the swab will have a marker (such as aruco marker) that can be automatically tracked in real-time using opencv, we will detect the location of the marker relative to the nose location and make sure it's correctly placed and rotated for the determined period. 
 - This whole pipeline will be developed as a state machine, whenever the user violates the rule of a certain step he goes back to the corresponding step and repeats again.
Challenges:
 - There is no available dataset for training our novel model (to classify whether the swab is inside the nose or not), we will need to build our own dataset using both synthetic and real data. We will build a small dataset (5000-6000 images) and will rely on transfer learning to finetune an existing face attribute (such as moustache and glasses) classification model on our small dataset.
 - The whole pipeline will need to operate at a minimum of 30 FPS on CPU in order to match the FPS of a real-time video stream.



Project # 1 Group Members:

Feng, Jared

Salm, Veronica

Jones-McCormick, Taj

Sarangian, Varnan

Title: An Arsenal of Augmentation Strategies for Natural Language Processing

Description: Data augmentation is an effective technique for improving the generalization power of modern neural architectures, especially when trained over smaller datasets. Strategies for modifying training examples in domains such as computer vision and speech processing are typically straightforward, however it becomes more challenging in text processing where such generalized approaches are non-trivial. Unlike with images, where most geometric operations typically preserve the images' significant features, incorporating augmentation in text at the syntactic level can reduce the data quality as it may produce noisy examples that are no longer human-readable.

The purpose of this project is to investigate data augmentation techniques in NLP to accommodate for training robust neural networks over smaller datasets. Ideally, this may act as an alternative to relying on finetuning computationally demanding pretrained models. We will start with a survey of existing text augmentation techniques and evaluate them on several downstream tasks (using existing benchmark datasets and possibly novel ones). We will further attempt to adapt popular augmentation approaches from external domains (i.e. computer vision) into an NLP setting. This will include an empirical investigation into determining whether its appropriate to augment at the syntactic (i.e. raw text) or semantic (i.e. encoded vectors) level.


Project # 2 Group members:

Shi, Yuliang

Liang, Wei


Title: Ensemble Learning Of Neural Network On Chest X-ray Image for COVID-19 Patients

Description:

COVID-19 is a contagious disease that emerged in Wuhan, China in December 2019 and has been spreading worldwide rapidly, presenting unprecedented challenges to the world. Infected people may have a wide range of symptoms such as fever, cough, fatigue, etc. Researchers are tirelessly working together to better understand the methodology of death caused by COVID-19 . However, whether the x-ray chest image can be detected automatically by machine is still required further investigation. This gives us the motivation to consider deep learning methodology for classifying x-ray images of COVID-19 patients. In this study, we are going to use the two datasets. The first dataset is from confirmed positive cases of COVID-19 with x-ray images and the second dataset is posted on the Kaggle website. In total, there are 1493 images with 224x224 pixels in our dataset.

Purpose:

The objective of this project is to identify different chest x-ray images from four different categories - COVID-19, Pneumonia bacterial, Pneumonia viral and normal chest. This is an image classification problem and we set up our model based on convolutional neural networks (CNN), with inputs being the chest x-ray images and outputs being the disease categories. CNN is powerful in computer visions and has achieved extensive success. A typical CNN architecture consists of input layers, convolutional layers, pooling layers and fully connected output layers. We will tailor our networks based on this classical CNN architecture and adopt some techniques for more accurate detection of COVID-19 from the x-ray images.

Challenges and Improvements:

We develop and improve our CNN based model in the following three domains:

First, we can use the pre-trained model to do transfer learning for higher precision. The last several layers will be refined in order to better train the images. Several techniques will be applied to avoid overfitting issue, including regulazations, batchnormal, and data augmentation.

Second, as there are so many networks has been proposed for COVID-19 detection using x-ray images, we can ensemble them together for lower variance and better performance.

Finally, vision Transformer (ViT) is a new but milestone technology in deep learning. Transformer was firstly came up for natural language processing. It was later applied to computer vision and achieved huge success. We are going to figure out the application of this new technology in classification of chest x-ray images.

Link To Proposal: STAT 940 Proposal


Project # 3 Group Members:

Leung, Alice

Yalsavar, Maryam

Jamialahmadi, Benyamin


Title: MLP-Mixer: MLP Architecture for NLP

Description: Although transformers, and other attention-based architectures are the common parts of the most state-of-the-art models in both Vision and Natural Language Processing (NLP), recently, “MLP-Mixer: An all-MLP Architecture for Vision” (https://arxiv.org/pdf/2105.01601.pdf) paper disclosed a model that is entirely based on multi-layer perceptrons (MLPs) with competitive results on both accuracy and computational cost to these advanced methods in field of vision. This paper claims that neither of attention, and convolutions are necessary which the claim is proved by its well-stablished experiments and results. Beside its astounding results, the application of this model is not yet addressed in field of NLP. So, this project approaches towards implementation of this model for NLP tasks. The overall outline and steps can be:

1. Paper and code assessment

2. Applying the model to a named entity recognition problem

3. Data preparation

4. Training, fine-tuning, and testing

Challenges: Challenges will be finding a good data set to apply NER to, and figuring out how to adapt this architecture from a computer vision problem to a NLP problem.



Project # 4 Group Members:

Mina Kebriaee


Title: Short-term Forecast for Solar Photovoltaic Farm Output

Description:

Synopsis: Electric power generation at a solar photovoltaic (PV) farm is predicted for near future based on historical data and weather features. A long-short-term-memory (LSTM) network is used to perform time-series prediction.

Solar energy is becoming increasingly popular as a viable renewable energy source. It offers many environmental advantages; however, fluctuations with changing weather patterns have always been a problem. While solar may not entirely replace fossil fuels, it has its place in the power generation portfolio. Many electricity companies are looking to add solar energy into their mix of power generation and in order to do so they require accurate solar production forecasts. Errors in forecasting could lead to large expenses like excess fuel consumption, emergency purchases of electricity from competitors, and/or might force utility operators to perform load shedding which directly affects electricity end-users. The major challenge in solar energy generation is the intermittency of photovoltaic system power generation mainly due to weather conditions which are extremely nonlinear and thus, analytical model-based approaches lack the required flexibility and complexity to capture the underlying patterns and behavior.
Applied machine learning (ML) techniques can help improve the prediction accuracy and effectively increase social benefits by using multi-dimensional feature-based networks that digest tens of relevant input data streams including historical data and local weather conditions. In other words, ML-based tools are inevitably becoming indispensable for reducing the effect of uncertainty and energy costs in modern power systems and smart grids. The goal of this project is to implement machine learning algorithm to predict the electric energy output at a solar PV farm based on local weather and temporal parameters. 
This project consists of three main parts:

1.Data: Pre-processing of the raw data files (input) and historical power generation data files (output) from the solar farm to get meaningful numeric values on an hourly basis.

2.Machine Learning: Designing and implementing Deep Neural Network (Long Short-Term Memory (LSTM) model to accurately forecast short-term photovoltaic solar power). This approach exploits the desirable properties of LSTM, which is a powerful tool for modeling dependency in data.

3.Application/reporting: Apply the model and report back the findings including the accuracy of the results using the available dataset.



Project # 5 Group Members:


Shervin Hakimi Mehrshad Sadria


Title: Kidney Function Analysis Using Deep Neural Networks

Description:

The kidney is one of the most intriguing organs of our body: a single kidney alone has enough resources for the body to function, it filters the blood and regulates fluids in the body, and is one of the most essential organs, but therein lies the problem: when this organ becomes ill there are multiple issues coming along with it such as Chronic Kidney Diseases, Renal Cell Carcinoma, etc. In these cases, the kidney progressively loses its function and would eventually lead to the person's death. There are several factors that can be used to assess the well-being of a kidney for instance number of glomeruli, cell size, urine secretion, etc. In this project, we aim to use clinical data focusing on kidneys to understand these diseases more thoroughly and in the end build a model that could assess the aforementioned clinical features. 

Challenges:


In order to answer the question we expect to be faced with few challenges as follow:

Clinical data annotation is a time-consuming and tedious task that requires the knowledge of the expert in the field which even might end up being accurate because of human error.

Cleaning biological data can be hard since your data might have a batch effect, missing values, alignment problems, and low image resolution.

Biological processes are complex which might require different types of datasets in different levels such as Genomics, Epigenomics, Proteomics and etc, in order to have a good understanding of the system.


Project # 6 Group Members:

Edward Bangala

Maruf Sazed

Title: Maximizing Classification Accuracy for a Fixed Model via Generative Adversarial Networks


Description:

In machine learning, models are trained on the training set and inference is done on the test set. To avoid overfitting, a validation set is used. In this approach, several models are used (with different architectures or through hyperparameter tuning), to identify the best possible model based on the validation set error. That is, the data set is kept fixed, and the models are changed. Even though different data augmentation techniques may be applied on the training set, a model selection process is still needed. In this project, we want to explore whether we can fix a model and find out the subset of the training data set that could possibly result in a greater accuracy. We will use a Generative Adversarial Network (GAN) to reconstruct images from the test data set. If the model is able to learn the distribution of the test set images, then the discriminator might be able to identify the images from the training set that are similar to the test set. The idea is borrowed from Andrew Ng, who recently launched a competition where he asked the participants to identify the best data set given a fixed model. The idea of using GANs to solve this problem is our own. However, it is well know that GANs are difficult to train and there is not certainty that subset of image could be better than using the full training images.