http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Kl4ng&feedformat=atomstatwiki - User contributions [US]2024-03-29T07:40:11ZUser contributionsMediaWiki 1.41.0http://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=51000stat441F212021-11-26T21:41:53Z<p>Kl4ng: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks || [https://arxiv.org/pdf/1804.00792.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Poison_Frogs_Neural_Networks Summary] ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || Deep Residual Learning for Image Recognition || [https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || XGBoost: A Scalable Tree Boosting System || [https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf Paper] || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || Robust Imitation Learning from Noisy Demonstrations || [http://proceedings.mlr.press/v130/tangkaratt21a/tangkaratt21a.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Robust_Imitation_Learning_from_Noisy_Demonstrations Summary] ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network || [https://www.researchgate.net/publication/344399165_Traffic_Sign_Recognition_System_TSRS_SVM_and_Convolutional_Neural_Network Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Traffic_Sign_Recognition_System_(TSRS):_SVM_and_Convolutional_Neural_Network Summary] ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease ||[https://www.mdpi.com/2076-3425/11/2/150/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Bayesian_Network_as_a_Decision_Tool_for_Predicting_ALS_Disease Summary]||<br />
|-<br />
|Week of Nov 29 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=A_Game_Theoretic_Approach_to_Class-wise_Selective_Rationalization#How_does_CAR_work_intuitively Summary]||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree Summary]||<br />
|-<br />
|Week of Nov 29 ||Christian Mitrache, Alexandra Mossman, Jessica Saini, Aaron Renggli|| || U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging|| [https://proceedings.neurips.cc/paper/2019/file/57bafb2c2dfeefba931bb03a835b1fa9-Paper.pdf?fbclid=IwAR1dZpx9vU1pSPTSm_nwk6uBU7TYJ2HNTrsqjaH-9ZycE_PFpFjJoHg1zhQ]||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=U-Time:A_Fully_Convolutional_Network_for_Time_Series_Segmentation_Applied_to_Sleep_Staging_Summary]||</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Robust_Imitation_Learning_from_Noisy_Demonstrations&diff=50999Robust Imitation Learning from Noisy Demonstrations2021-11-26T21:41:09Z<p>Kl4ng: Created page with "== Presented by == Kar Lok Ng, Muhan (Iris) Li == Introduction == In Imitation Learning (IL), an agent (such as a neural network) aims to learn a policy from demonstrations..."</p>
<hr />
<div>== Presented by == <br />
Kar Lok Ng, Muhan (Iris) Li<br />
<br />
== Introduction ==<br />
In Imitation Learning (IL), an agent (such as a neural network) aims to learn a policy from demonstrations of desired behaviour, so that it can make the desired decisions when presented with new situations. It differs from traditional Reinforcement Learning (RL), as it makes no assumption as to the nature of a reward function. IL methods assume that the demonstrations we feed the algorithm is optimal (or near optimal). This creates a big problem, as this method becomes very susceptible to poor data (i.e., not very robust). This intuitively makes sense, as the agent cannot effectively learn the optimal policy when it is fed low-quality demonstrations. As such, a robust method of IL is desired so that it can make better decisions despite being presented with noisy data. <br />
Established methods to combat noisy data in IL have limitations. One proposed solution requires the noisy demonstration to be ranked according to their relative performance to each other. Another similar method requires extra labelling of the data with a score that determines the probability that a particular demonstration is from an expert (a “good” demonstration). Both methods require extra data preprocessing that may not be feasible. A third method did not require these labels, but instead assume that noisy demonstrations were generated by a Gaussian distribution. This strict assumption limits the useability of such a model. <br />
Thus, a new method for IL from noisy demonstration is created. In this paper, they called this method Robust IL with Co-pseudo-labelling (RIL-Co). This method does not require additional labelling, nor does it require assumptions to be made about the noise distributions. <br />
<br />
== Model Architecture ==<br />
On the basis of IL, the paper considers a scenario of replacing given demonstrations by a mixture of expert and non-expert demonstrations. Previously, the expert policy to be learned in IL is,<br />
[[File:eqn1_KL.PNG|center]]<br />
where <math>\rho_E</math> is a state-action density of expert policy <math>\pi_E</math>; state s∈S and action a∈A under the discrete-time MDP denoted by <math>\mathcal{M} = (\mathcal{S}, \mathcal{A}, \mathcal{\rho_T}(s'|s,a)),\rho_1(s_1), r(s,a),\gamma)</math><br />
<br />
Then, within the new approach, there is an assumption that it is given a dataset of state-action samples drawn from a noisy state-action density,<br />
[[File:eqn2_KL.PNG|center]]<br />
Where <math>\rho'</math> is a mixture of the expert and non-expert state-action densities <br />
<math>\rho'(s,a) = \alpha \rho_E(s,a) + (1-\alpha)\rho_N(s,a)</math><br />
<br />
Following the typical assumption of learning from noisy data, <math>\alpha</math> as a mixing coefficient is chosen between 0.5 and 1. Also, <math>\rho_N</math> is the state-action density of a non-expert policy <math>\pi_N</math>.<br />
<br />
Imitation Learning via Risk Optimization<br />
Under the assumption of the Mixture state-action density,<br />
<math>\rho_{\pi}(x) = \kappa(\pi) \rho_E(x) + (1- \kappa(\pi))\rho_N(x)</math><br />
Where <math>\rho_{\pi}(x)</math>, <math>\rho_{E}(x)</math>, and <math>\rho_{N}(x)</math> are the state-action densities of the learning, expert and non-expert policy, respectively.<br />
<br />
The paper proposes to perform IL by solving the risk optimization problem,<br />
max┬πmin┬gR (g;ρ^',ρ_π^λ,l_sym) (1)<br />
<math>max_{\substack{\pi}} min_{\substack{g}} \mathcal(R)(g;\rho',\rho_{\pi}^{\lambda},l_{sym}</math><br />
where <math>R</math> is the balanced risk; <math>\rho_{\pi}^{\lambda}</math> is a mixture density; <math>\lambda</math> is a hyper-parameter; <math>\pi</math> is a policy to be learned by maximizing the risk; <math>g</math> is a classifier to be learned by minimizing the risk and <math>l_{sym}</math> is a symmetric loss.<br />
<br />
Besides, following Charoenphakdee et al. (2019), the paper also constructs a lemma indicating that, a minimizer <math>g^*</math> of <math>\mathcal(R)(g;\rho',\rho_{\pi}^{\lambda},l_{sym})</math> is identical to that of <math>\mathcal{R}(g;\rho_E,\rho_N,l_{sym})</math>. By this lemma, it is proved that the maximizer of the risk optimization in equation (1) is the expert policy.<br />
<br />
It has been shown in the essay that robust IL can be achieved by optimizing the risk in equation (1). More importantly, this significant result indicates that robust IL is achievable without the knowledge of the mixing coefficient α nor estimates of α.<br />
<br />
<br />
===Co-pseudo-labeling for Risk Optimization===<br />
To address the issue of failing to optimize in equation (1), the paper suggests approximately drawing samples from <math>\rho_N(x)</math> by using co-pseudo-labeling. The methodology is to estimate the expectation over <math>\rho_N</math>. The authors firstly introduced pseudo-labeling to find the empirical risk in order to solve equation (1) in their setting. However, the over-confidence of the classifier arises from incorrectly predictions of the labels during training. Then the authors proposed co-pseudo-labeling which combined the ideas of pseudo-labeling and co-training, namely Robust IL with Co-pseudo-labeling (RIL-Co). After determining the overall framework of the model, the authors also made a choice of hyper-parameter <math>\lambda</math>. It is demonstrated that the appropriate value of <math>\lambda</math>is <math>0.5 \le \lambda < 1(x)</math>. Specifically, to avoid increasing the impact of pseudo-labels on the risks, the authors decided to use <math>\lambda=0.5</math>. In addition, with regard to the choice of symmetric loss, the authors emphasized that any symmetric loss can be used to learn the expert policy with RIL-Co. As Figure 1 shown, the loss can become symmetric after normalization.<br />
[[File:fig1_KL.png|center]]<br />
<br />
The algorithm of RIC-Co is described with the steps in Figure 2.<br />
[[File:fig2_KL.png|center]]<br />
<br />
== Methodology and Results == <br />
<br />
[[File:fig3_KL.PNG|center]]<br />
<br />
The RIL-Co model with Average-Precision (AP) loss is benchmarked against other established models: Behavioural Cloning (BC), Forward Inverse Reinforcement Learning (FAIRL), Variational Imitation Learning with Diverse-quality Demonstration (VILD), and three variations of Generative adversarial imitation learning (GAIL) with logistic, unhinged and AP loss functions. All models have the same structure of 2 hidden-layers, with 64 hyperbolic tangent nodes. The policy networks use trust region policy gradient (instead of stochastic gradient like we have seen in our courses), and the classifiers are trained by Adam, with a gradient penalty regularization penalty of 10. <br />
<br />
The task supplied to the model to be trained on is to generate a model that walks. There are 4 simulated methods of walking: HalfCheetah, Hopper, Walker2d, and Ant. To generate the demonstrations to train the model on, a regular reinforcement learning model is used with true, known, reward functions. The best performing policy snapshot is then used to generate 10,000 “expert” state-action samples, and the 5 other policy snapshots are used to collect 10,000 “non-expert” state-action samples. The two sets of state-action samples are then mixed with varying noise rates of 0, 0.1, 0.2, 0.3 and 0.4 (e.g., the dataset consisting of 10,000 expert samples and 7500 non-expert samples corresponds to a noise rate of 0.4). <br />
<br />
The models are judged on their effectiveness by the cumulative reward. In the experimentation, they observed that RIL-Co performed better than the rest in high noise scenarios (noise rate of 0.2, 0.3 and 0.4), while in low noise scenarios, RIL-Co performs comparably to the best performing alternative. GAIL with AP loss performs better than RIL-Co in low noise scenarios. The authors conject that this is due to co-pseudo-labelling adding additional bias. They propose a fix by varying the hyperparameter lambda from 0, which is equivalent to performing GAIL, to 0.5 as learning progresses. <br />
<br />
VILD performs poorly with even small amounts of noisy data (with rate 0.1). The authors believe because VILD has a strict Gaussian noise assumption, and the data is not generated with any noise assumptions, VILD could not accurately estimate the noise distribution and thus performs poorly. BC also performs poorly, which is as expected as BC assumes the demonstrations fed into the model are expert models. <br />
The authors also observe that RIL-Co uses fewer transition samples, and thus learns quicker, than other methods in this test. Thus RIL-Co is more data efficient, which is a useful property for a model to have.<br />
<br />
[[File:fig4_KL.PNG|center]]<br />
<br />
An ablation study was conducted, where parts of the model is changed out (such as the loss function) to observe how the model behaves under this change. The loss function was swapped out for a logistic loss to get a better picture of how important a symmetric loss function is to the model. The results indicate that the original AP loss function outperformed the logistic loss, which indicates that using the symmetric loss is important for the model’s ability to be robust. <br />
<br />
[[File:fig5_KL.PNG|center]]<br />
<br />
Another aspect that was tested was the type of noise presented to the model. The RIL-Co model with AP loss was presented with a noisy dataset generated with Gaussian noise. As expected VILD performed much better, since this fits in with the strict Gaussian noise assumption made in the model. RIL-Co achieved performance comparable to VILD given enough transitions, despite no assumption being made in the formulation of the model. This shows promise that RIL-Co performs well under different noise distributions. <br />
<br />
== Conclusion ==<br />
A new method for IL from noisy demonstrations is presented, which is more robust than other established methods. Further investigation can be done to see how well this model works under non-simulated data. <br />
<br />
== References ==</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=test&diff=50998test2021-11-26T21:39:08Z<p>Kl4ng: Created page with "== Presented by == Kar Lok Ng, Muhan (Iris) Li == Introduction == In Imitation Learning (IL), an agent (such as a neural network) aims to learn a policy from demonstrations..."</p>
<hr />
<div>== Presented by == <br />
Kar Lok Ng, Muhan (Iris) Li<br />
<br />
== Introduction ==<br />
In Imitation Learning (IL), an agent (such as a neural network) aims to learn a policy from demonstrations of desired behaviour, so that it can make the desired decisions when presented with new situations. It differs from traditional Reinforcement Learning (RL), as it makes no assumption as to the nature of a reward function. IL methods assume that the demonstrations we feed the algorithm is optimal (or near optimal). This creates a big problem, as this method becomes very susceptible to poor data (i.e., not very robust). This intuitively makes sense, as the agent cannot effectively learn the optimal policy when it is fed low-quality demonstrations. As such, a robust method of IL is desired so that it can make better decisions despite being presented with noisy data. <br />
Established methods to combat noisy data in IL have limitations. One proposed solution requires the noisy demonstration to be ranked according to their relative performance to each other. Another similar method requires extra labelling of the data with a score that determines the probability that a particular demonstration is from an expert (a “good” demonstration). Both methods require extra data preprocessing that may not be feasible. A third method did not require these labels, but instead assume that noisy demonstrations were generated by a Gaussian distribution. This strict assumption limits the useability of such a model. <br />
Thus, a new method for IL from noisy demonstration is created. In this paper, they called this method Robust IL with Co-pseudo-labelling (RIL-Co). This method does not require additional labelling, nor does it require assumptions to be made about the noise distributions. <br />
<br />
== Model Architecture ==<br />
On the basis of IL, the paper considers a scenario of replacing given demonstrations by a mixture of expert and non-expert demonstrations. Previously, the expert policy to be learned in IL is,<br />
[[File:eqn1_KL.PNG|center]]<br />
where <math>\rho_E</math> is a state-action density of expert policy <math>\pi_E</math>; state s∈S and action a∈A under the discrete-time MDP denoted by <math>\mathcal{M} = (\mathcal{S}, \mathcal{A}, \mathcal{\rho_T}(s'|s,a)),\rho_1(s_1), r(s,a),\gamma)</math><br />
<br />
Then, within the new approach, there is an assumption that it is given a dataset of state-action samples drawn from a noisy state-action density,<br />
[[File:eqn2_KL.PNG|center]]<br />
Where <math>\rho'</math> is a mixture of the expert and non-expert state-action densities <br />
<math>\rho'(s,a) = \alpha \rho_E(s,a) + (1-\alpha)\rho_N(s,a)</math><br />
<br />
Following the typical assumption of learning from noisy data, <math>\alpha</math> as a mixing coefficient is chosen between 0.5 and 1. Also, <math>\rho_N</math> is the state-action density of a non-expert policy <math>\pi_N</math>.<br />
<br />
Imitation Learning via Risk Optimization<br />
Under the assumption of the Mixture state-action density,<br />
<math>\rho_{\pi}(x) = \kappa(\pi) \rho_E(x) + (1- \kappa(\pi))\rho_N(x)</math><br />
Where <math>\rho_{\pi}(x)</math>, <math>\rho_{E}(x)</math>, and <math>\rho_{N}(x)</math> are the state-action densities of the learning, expert and non-expert policy, respectively.<br />
<br />
The paper proposes to perform IL by solving the risk optimization problem,<br />
max┬πmin┬gR (g;ρ^',ρ_π^λ,l_sym) (1)<br />
<math>max_{\substack{\pi}} min_{\substack{g}} \mathcal(R)(g;\rho',\rho_{\pi}^{\lambda},l_{sym}</math><br />
where <math>R</math> is the balanced risk; <math>\rho_{\pi}^{\lambda}</math> is a mixture density; <math>\lambda</math> is a hyper-parameter; <math>\pi</math> is a policy to be learned by maximizing the risk; <math>g</math> is a classifier to be learned by minimizing the risk and <math>l_{sym}</math> is a symmetric loss.<br />
<br />
Besides, following Charoenphakdee et al. (2019), the paper also constructs a lemma indicating that, a minimizer <math>g^*</math> of <math>\mathcal(R)(g;\rho',\rho_{\pi}^{\lambda},l_{sym})</math> is identical to that of <math>\mathcal{R}(g;\rho_E,\rho_N,l_{sym})</math>. By this lemma, it is proved that the maximizer of the risk optimization in equation (1) is the expert policy.<br />
<br />
It has been shown in the essay that robust IL can be achieved by optimizing the risk in equation (1). More importantly, this significant result indicates that robust IL is achievable without the knowledge of the mixing coefficient α nor estimates of α.<br />
<br />
<br />
===Co-pseudo-labeling for Risk Optimization===<br />
To address the issue of failing to optimize in equation (1), the paper suggests approximately drawing samples from <math>\rho_N(x)</math> by using co-pseudo-labeling. The methodology is to estimate the expectation over <math>\rho_N</math>. The authors firstly introduced pseudo-labeling to find the empirical risk in order to solve equation (1) in their setting. However, the over-confidence of the classifier arises from incorrectly predictions of the labels during training. Then the authors proposed co-pseudo-labeling which combined the ideas of pseudo-labeling and co-training, namely Robust IL with Co-pseudo-labeling (RIL-Co). After determining the overall framework of the model, the authors also made a choice of hyper-parameter <math>\lambda</math>. It is demonstrated that the appropriate value of <math>\lambda</math>is <math>0.5 \le \lambda < 1(x)</math>. Specifically, to avoid increasing the impact of pseudo-labels on the risks, the authors decided to use <math>\lambda=0.5</math>. In addition, with regard to the choice of symmetric loss, the authors emphasized that any symmetric loss can be used to learn the expert policy with RIL-Co. As Figure 1 shown, the loss can become symmetric after normalization.<br />
[[File:fig1_KL.png|center]]<br />
<br />
The algorithm of RIC-Co is described with the steps in Figure 2.<br />
[[File:fig2_KL.png|center]]<br />
<br />
== Methodology and Results == <br />
<br />
[[File:fig3_KL.PNG|center]]<br />
<br />
The RIL-Co model with Average-Precision (AP) loss is benchmarked against other established models: Behavioural Cloning (BC), Forward Inverse Reinforcement Learning (FAIRL), Variational Imitation Learning with Diverse-quality Demonstration (VILD), and three variations of Generative adversarial imitation learning (GAIL) with logistic, unhinged and AP loss functions. All models have the same structure of 2 hidden-layers, with 64 hyperbolic tangent nodes. The policy networks use trust region policy gradient (instead of stochastic gradient like we have seen in our courses), and the classifiers are trained by Adam, with a gradient penalty regularization penalty of 10. <br />
<br />
The task supplied to the model to be trained on is to generate a model that walks. There are 4 simulated methods of walking: HalfCheetah, Hopper, Walker2d, and Ant. To generate the demonstrations to train the model on, a regular reinforcement learning model is used with true, known, reward functions. The best performing policy snapshot is then used to generate 10,000 “expert” state-action samples, and the 5 other policy snapshots are used to collect 10,000 “non-expert” state-action samples. The two sets of state-action samples are then mixed with varying noise rates of 0, 0.1, 0.2, 0.3 and 0.4 (e.g., the dataset consisting of 10,000 expert samples and 7500 non-expert samples corresponds to a noise rate of 0.4). <br />
<br />
The models are judged on their effectiveness by the cumulative reward. In the experimentation, they observed that RIL-Co performed better than the rest in high noise scenarios (noise rate of 0.2, 0.3 and 0.4), while in low noise scenarios, RIL-Co performs comparably to the best performing alternative. GAIL with AP loss performs better than RIL-Co in low noise scenarios. The authors conject that this is due to co-pseudo-labelling adding additional bias. They propose a fix by varying the hyperparameter lambda from 0, which is equivalent to performing GAIL, to 0.5 as learning progresses. <br />
<br />
VILD performs poorly with even small amounts of noisy data (with rate 0.1). The authors believe because VILD has a strict Gaussian noise assumption, and the data is not generated with any noise assumptions, VILD could not accurately estimate the noise distribution and thus performs poorly. BC also performs poorly, which is as expected as BC assumes the demonstrations fed into the model are expert models. <br />
The authors also observe that RIL-Co uses fewer transition samples, and thus learns quicker, than other methods in this test. Thus RIL-Co is more data efficient, which is a useful property for a model to have.<br />
<br />
[[File:fig4_KL.PNG|center]]<br />
<br />
An ablation study was conducted, where parts of the model is changed out (such as the loss function) to observe how the model behaves under this change. The loss function was swapped out for a logistic loss to get a better picture of how important a symmetric loss function is to the model. The results indicate that the original AP loss function outperformed the logistic loss, which indicates that using the symmetric loss is important for the model’s ability to be robust. <br />
<br />
[[File:fig5_KL.PNG|center]]<br />
<br />
Another aspect that was tested was the type of noise presented to the model. The RIL-Co model with AP loss was presented with a noisy dataset generated with Gaussian noise. As expected VILD performed much better, since this fits in with the strict Gaussian noise assumption made in the model. RIL-Co achieved performance comparable to VILD given enough transitions, despite no assumption being made in the formulation of the model. This shows promise that RIL-Co performs well under different noise distributions. <br />
<br />
== Conclusion ==<br />
A new method for IL from noisy demonstrations is presented, which is more robust than other established methods. Further investigation can be done to see how well this model works under non-simulated data. <br />
<br />
== References ==</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:eqn2_KL.PNG&diff=50997File:eqn2 KL.PNG2021-11-26T21:05:01Z<p>Kl4ng: Equation2, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Equation2, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:eqn1_KL.PNG&diff=50996File:eqn1 KL.PNG2021-11-26T21:01:39Z<p>Kl4ng: Kl4ng uploaded a new version of File:eqn1 KL.PNG</p>
<hr />
<div>Equation1, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:eqn1_KL.PNG&diff=50995File:eqn1 KL.PNG2021-11-26T20:58:47Z<p>Kl4ng: Equation1, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Equation1, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:fig5_KL.PNG&diff=50994File:fig5 KL.PNG2021-11-26T20:49:07Z<p>Kl4ng: Figure 5, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Figure 5, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:fig4_KL.PNG&diff=50993File:fig4 KL.PNG2021-11-26T20:47:57Z<p>Kl4ng: Figure 4, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Figure 4, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:fig3_KL.PNG&diff=50992File:fig3 KL.PNG2021-11-26T20:44:05Z<p>Kl4ng: Figure 3, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Figure 3, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:fig2_KL.png&diff=50991File:fig2 KL.png2021-11-26T20:39:23Z<p>Kl4ng: Figure 2, Kar Lok/Muhan (Iris)</p>
<hr />
<div>Figure 2, Kar Lok/Muhan (Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:fig1_KL.png&diff=50990File:fig1 KL.png2021-11-26T20:38:58Z<p>Kl4ng: Figure 1, Kar Lok/Muhan(Iris)</p>
<hr />
<div>Figure 1, Kar Lok/Muhan(Iris)</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50153stat441F212021-11-12T02:56:10Z<p>Kl4ng: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 22 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || || || ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf] || ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || || || ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || || || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Another Look At Distance-Weighted Discrimination || [http://users.stat.umn.edu/~wang3660/papers/kerndwd.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Another_look_at_distance-weighted_discrimination Summary] ||<br />
|-<br />
|Week of Nov 22 || Lingshan Wang, Yifan Li, Ziyi Liu || || Understanding Convolutional Neural Networks for Text Classification || [https://arxiv.org/pdf/1809.08037.pdf Paper] || ||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || || || ||<br />
|-</div>Kl4nghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=49967F21-STAT 441/841 CM 763-Proposal2021-10-07T17:29:20Z<p>Kl4ng: Addition of Project #12</p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: <br />
<br />
Description:<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Application of Neural Networks<br />
<br />
Description: To be filled in before Oct 8th.<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Title: TBD <br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Zhang, Bowen<br />
<br />
Li, Shaozhong<br />
<br />
Kerr, Hannah<br />
<br />
Wong, Ann gie<br />
<br />
Title: Classification<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Hoa, Dieu<br />
<br />
Sin, Mary Jane<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Xu, Siming<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------</div>Kl4ng