http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=T229yu&feedformat=atomstatwiki - User contributions [US]2022-05-19T22:59:17ZUser contributionsMediaWiki 1.28.3http://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51241F21-STAT 441/841 CM 763-Proposal2021-12-22T01:30:29Z<p>T229yu: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks<br />
<br />
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch-wise and whole image-wise. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that support vector machine (SVM) on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification.<br />
<br />
Our poster is [https://www.dropbox.com/s/fu6vr2cxcbt4458/Stat_841_poster.pdf?dl=0 here].<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Clean-Label Targeted Poisons for an End-to-End Trained CNN on the MNIST Dataset<br />
<br />
Description: Applying data poisoning techniques to the MNIST Dataset<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques<br />
<br />
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51239F21-STAT 441/841 CM 763-Proposal2021-12-21T21:02:34Z<p>T229yu: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks<br />
<br />
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reduce the computational difficulty. The classification task is decomposed into patch level and whole image level. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that XGBoost on extracted feature vectors outperforms all other methods and achieves an accuracy of 67.86% based on DenseNet-121 model for patch-wise classification.<br />
<br />
Our poster is [https://www.dropbox.com/s/fu6vr2cxcbt4458/Stat_841_poster.pdf?dl=0 here].<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Data Poison Attacks<br />
<br />
Description: Attempting to create a successful data poisoning attack<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques<br />
<br />
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51238F21-STAT 441/841 CM 763-Proposal2021-12-21T20:58:54Z<p>T229yu: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks<br />
<br />
In this project, we explore the classification problem of lung cancer pathological images of some patients. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches in order to reducing the computational difficulty. The classification task is decomposed into patch level and whole image level. We experiment with three neural networks for patch-wise classification, and two classical machine learning models for patient classification. Techniques of feature extraction and sampling methods for training neural networks are also implemented and studied. Our results show that XGBoost on extracted feature vectors outperforms all other methods and achieves the accuracy of 67.86\% based on DenseNet-121 model for patch-wise classification.<br />
<br />
Our poster is here.<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Data Poison Attacks<br />
<br />
Description: Attempting to create a successful data poisoning attack<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques<br />
<br />
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51236F21-STAT 441/841 CM 763-Proposal2021-12-20T08:06:21Z<p>T229yu: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks<br />
<br />
In this project, we explore the classification problem of lung cancer pathological images. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches. We will follow the paper ''Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification''.<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Data Poison Attacks<br />
<br />
Description: Attempting to create a successful data poisoning attack<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50253Don't Just Blame Over-parametrization2021-11-14T04:05:05Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
"Logistic regression is well-specified when data comes from itself. With <math>\{y_i\}</math> generated from a logistic model with coefficient <math>\textbf{w}_*</math>, we always have <math>\text{argmin}_{\textbf{w}} L(\textbf{w}) = \textbf{w}_*</math>. (see [Hastie et al., '09])<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that '''over-confidence is not universal''': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50252Don't Just Blame Over-parametrization2021-11-14T04:04:23Z<p>T229yu: /* References */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
"Logistic regression is well-specified when data comes from itself. With <math>\{y_i\}</math> generated from a logistic model with coefficient <math>\textbf{w}_*</math>, we always have <math>\text{argmin}_{\textbf{w}} L(\textbf{w}) = \textbf{w}_*</math>. (see [Hastie et al., '09])<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50251Don't Just Blame Over-parametrization2021-11-14T04:03:13Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
"Logistic regression is well-specified when data comes from itself. With <math>\{y_i\}</math> generated from a logistic model with coefficient <math>\textbf{w}_*</math>, we always have <math>\text{argmin}_{\textbf{w}} L(\textbf{w}) = \textbf{w}_*</math>. (see [Hastie et al., '09])<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50250Don't Just Blame Over-parametrization2021-11-14T04:03:03Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
"Logistic regression is \textbf{well-specified} when data comes from itself. With <math>\{y_i\}</math> generated from a logistic model with coefficient <math>\textbf{w}_*</math>, we always have <math>\text{argmin}_{\textbf{w}} L(\textbf{w}) = \textbf{w}_*</math>. (see [Hastie et al., '09])<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50249Don't Just Blame Over-parametrization2021-11-14T04:01:44Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50248Don't Just Blame Over-parametrization2021-11-14T04:01:21Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <center><math><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math></center><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50247Don't Just Blame Over-parametrization2021-11-14T04:00:32Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <math><br />
<center><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],<center></math><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50246Don't Just Blame Over-parametrization2021-11-14T04:00:20Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
'''Data distribution''': Consider <math>\textbf{X} \sim N(0, I_d)</math> and <math>P(Y = 1|\textbf{X} = x) = \sigma(\textbf{w}_*^Tx)</math>, where <math>\textbf{w}_*\in \mathbb{R} </math> is the ground truth coefficient vector.<br />
<br />
'''Model''': with the above data input, minimize the binary cross-entropy loss : <math><br />
<center><br />
\hat{\textbf{w}} = \text{argmin}_{\textbf{w}} L(\textbf{w}) = \frac{1}{n}\sum^n_{i=1}[log(1+exp(\textbf{w}^T\textbf{x}_i)) - y_i\textbf{w}^T\textbf{x}_i],</math><br />
given that <math>\sigma(z) = \frac{1}{1+e^{-z}}</math>.<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50245Don't Just Blame Over-parametrization2021-11-14T03:57:37Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n </math> data points <math> {(x_{i}, y_{i})}_{i=1}^n \sim_{iid} P </math> for some distribution <math>P</math> on <math>R^d\times [0,1]</math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50244Don't Just Blame Over-parametrization2021-11-14T03:57:20Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n </math> data points <math> {(x_{i}, y_{i})}_{i=1}^n iid \sim P </math> for some distribution <math>P</math> on <math>R^d\times [0,1]</math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50243Don't Just Blame Over-parametrization2021-11-14T03:57:02Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n </math> data points <math> {x_{i}, y_{i}}_{i=1}^n iid ~ P </math> for some distribution <math>P</math> on <math>R^d\times [0,1]</math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50242Don't Just Blame Over-parametrization2021-11-14T03:56:46Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math> {x_{i}, y_{i}}_{i=1}^n iid ~ P <\math> for some distribution <math>P<\math> on <math>R^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50241Don't Just Blame Over-parametrization2021-11-14T03:56:22Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math> {(x_{i}, y_{i}}_{i=1}^{n} iid \sim P <\math> for some distribution <math>P<\math> on <math>R^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50240Don't Just Blame Over-parametrization2021-11-14T03:56:08Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math> {(\textbf{x}_{i}, y_{i}}_{i=1}^{n} iid \sim P <\math> for some distribution <math>P<\math> on <math>\mathbb{R}^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50239Don't Just Blame Over-parametrization2021-11-14T03:55:54Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math> {(\textbf{x}_{i}, y_{i}}_{i=1}^{n} iid \sim\mathbf{P}<\math> for some distribution <math>\mathbf{P}<\math> on <math>\mathbb{R}^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50238Don't Just Blame Over-parametrization2021-11-14T03:55:34Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math> {(\textbf{x}_{i}, y_{i}}_{i=1}^{n} \stackrel{\text { iid }}{\sim}\mathbf{P}<\math> for some distribution <math>\mathbf{P}<\math> on <math>\mathbb{R}^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50237Don't Just Blame Over-parametrization2021-11-14T03:54:49Z<p>T229yu: /* Model Architecture */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture ==<br />
<br />
Consider binary classification problems: observe <math> n <\math> data points <math>\left\{\left(\textbf{x}_{i}, y_{i}\right)\right\}_{i=1}^{n} \stackrel{\text { iid }}{\sim}\mathbf{P}<\math> for some distribution <math>\mathbf{P}<\math> on <math>\mathbb{R}^d\times [0,1]<\math><br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>d/n</math>. Second, over-confidence is more severe when <math>d/n</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50228Don't Just Blame Over-parametrization2021-11-13T18:33:27Z<p>T229yu: /* References */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights'', Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50227Don't Just Blame Over-parametrization2021-11-13T18:33:16Z<p>T229yu: /* References */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>''A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights''<br />
, Xiaoyi Mai, Zhenyu Liao, R. Couillet. Published 1 May 2019. Computer Science, Mathematics. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50226Don't Just Blame Over-parametrization2021-11-13T18:32:24Z<p>T229yu: /* References */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/document/8683376 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50225Don't Just Blame Over-parametrization2021-11-13T18:32:04Z<p>T229yu: /* References */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==<br />
* <sup>[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1380068 [1]]</sup>G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: A new learning scheme of feedforward neural networks,” in Proc. IJCNN,Budapest, Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–990.<br />
<br />
* <sup>[https://www.sciencedirect.com/science/article/pii/S0925231210002225 [2]]</sup>G.-B. Huang, X.Ding, and H.Zhou, ''Optimization method based extreme learning machine for classification," Neurocomputing, vol. 74, no. 1-3, pp. 155-163, Dec. 2010.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50198Don't Just Blame Over-parametrization2021-11-13T09:30:13Z<p>T229yu: /* Critiques */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
Their work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50197Don't Just Blame Over-parametrization2021-11-13T09:29:46Z<p>T229yu: /* Critiques */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that <br />
<br />
(1) Over-confidence is not just a result of over-parametrization; <br />
<br />
(2) Over-confidence is a common mode but not universal. <br />
<br />
We believe our work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50196Don't Just Blame Over-parametrization2021-11-13T09:29:27Z<p>T229yu: /* Critiques */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
This paper provides a precise theoretical study of the calibration error of logistic regression and a class of general binary classification problems. They show that logistic regression is inherently over-confident by <math> \Theta (d/n) </math> as <math>n /d </math> is large,and establish sufficient conditions for the over-or under-confidence of unregularized ERM for general binary classification. Their results reveal that (1) Over-confidence is not just a result of over-parametrization; (2) Over-confidence is a common mode but not universal. We believe our work opens up a number of future questions, such as the interplay between calibration and model training (or regularization), or theoretical studies of calibration on nonlinear models.<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50195Don't Just Blame Over-parametrization2021-11-13T09:27:02Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. '''The well-specified logistic regression is inherently over-confident''': <br />
<br />
Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
2. The authors identify '''sufficient conditions for over-and under-confidence in general binary classification problems''', where the data is generated from an arbitrary nonlinear activation, and they solve a well-specified empirical risk minimization (ERM) problem with a suitable loss function. Their conditions imply that any symmetric, monotone activation <math> \sigma: R→[0,1]</math> that is concave at all <math> z >0 </math> will yield a classifier that is over-confident at any confidence level.<br />
<br />
3. Another perhaps surprising implication is that ''over-confidence is not universal'': <br />
<br />
They prove that there exists an activation function for which under-confidence can happen for a certain range of confidence levels.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50194Don't Just Blame Over-parametrization2021-11-13T09:23:20Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. The well-specified logistic regression is inherently over-confident: Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to ∞</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50193Don't Just Blame Over-parametrization2021-11-13T09:22:55Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. The well-specified logistic regression is inherently over-confident: Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount of <math> \Theta (d/n)</math>, in the limit of <math> n,d\to \infity</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50192Don't Just Blame Over-parametrization2021-11-13T09:22:35Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. The well-specified logistic regression is inherently over-confident: Conditioned on the model predicting <math> p > 0.5</math>, the actual probability of the label being one is lower by an amount ofΘ(d/n), in the limit of <math> n,d\to \infity</math> proportionally and <math> n/d </math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)</math> in this limiting regime.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50191Don't Just Blame Over-parametrization2021-11-13T09:22:01Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. The well-specified logistic regression is inherently over-confident: Conditioned on the model predicting <math> p >0.5<\math>, the actual probability of the label being one is lower by an amount ofΘ(d/n), in the limit of <math> n,d\to \infity<\math> proportionally and <math> n/d <\math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math> \Theta (d/n)<\math> in this limiting regime.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50190Don't Just Blame Over-parametrization2021-11-13T09:21:47Z<p>T229yu: /* Conclusion */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion ==<br />
1. The well-specified logistic regression is inherently over-confident: Conditioned on the model predicting <math> p >0.5<\math>, the actual probability of the label being one is lower by an amount ofΘ(d/n), in the limit of <math>n,d\to \infity<\math> proportionally and <math> n/d <\math> is large. In other words, the calibration error is always in the over-confident direction. Also, the overall Calibration Error (CE) of the logistic model is <math>\Theta (d/n)<\math> in this limiting regime.<br />
<br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50189Don't Just Blame Over-parametrization2021-11-13T09:18:46Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. '''Algorithms for model calibration''' <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. '''Theoretical analysis of calibration'''<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. '''High-dimensional behaviors of empirical risk minimization'''<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50188Don't Just Blame Over-parametrization2021-11-13T09:15:17Z<p>T229yu: /* Motivation */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. High-dimensional behaviors of empirical risk minimization<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question that why such over-confidence happens for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50187Don't Just Blame Over-parametrization2021-11-13T09:14:25Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020) or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2. Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. High-dimensional behaviors of empirical risk minimization<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds on the characterization for unregularized convex risk minimization problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50186Don't Just Blame Over-parametrization2021-11-13T09:12:25Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020), or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2.Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\Theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
3. High-dimensional behaviors of empirical risk minimization<br />
<br />
There is a rapidly growing literature on limiting characterizations of convex optimization-based estimators in the <math> n\propto d </math> regime (Donoho et al., 2009; Bayati & Montanari,2011; El Karoui et al., 2013; Karoui, 2013; Stojnic, 2013;Thrampoulidis et al., 2015; 2018; Mai et al., 2019; Sur &Cand`es, 2019; Cand`es et al., 2020). This paper analysis builds onthe characterization for unregularized convex risk minimiza-tion problems (including logistic regression) derived in Sur& Cand`es (2019).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50185Don't Just Blame Over-parametrization2021-11-13T09:10:41Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020), or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2.Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies <math>\theta(d/n)</math> calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50184Don't Just Blame Over-parametrization2021-11-13T09:09:48Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020), or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2.Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})</math> upper bound for well-specified logistic regression, whereas our main result implies Θ(d/n)calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50183Don't Just Blame Over-parametrization2021-11-13T09:09:17Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020), or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2.Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to a <math> O(\sqrt{d/n})<math> upper bound for well-specified logistic regression, whereas our main result implies Θ(d/n)calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50182Don't Just Blame Over-parametrization2021-11-13T09:08:11Z<p>T229yu: /* Previous Work */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
1. Algorithms for model calibration. <br />
<br />
Practitioners have observed and dealt with the over-confidence of logistic regression long ago. Recalibration algorithms fix this by adjusting the output of a well-trained model and dates back to the classical methods of Platt scaling (Platt et al., 1999), histogram binning (Zadrozny & Elkan), and isotonic regression (Zadrozny & Elkan, 2002). Platt et al. (1999) also use a particular kind of label smoothing as a way of mitigating the over-confidence in logistic regression. Guo et al. (2017)show that temperature scaling, a simple method that learns a rescaling factor for the logits, is a competitive method for calibrating neural networks. A number of recent recalibration methods further improve the performances over these approaches (Kull et al., 2017; 2019; Ding et al., 2020;Rahimi et al., 2020; Zhang et al., 2020)<br />
<br />
Another line of work improves calibration by aggregating the probabilistic predictions over multiple models, using either an ensemble of models (Lakshminarayanan et al.,2016; Malinin et al., 2019; Wen et al., 2020; Tran et al.,2020), or randomized predictions such as Bayesian neural networks (Gal & Ghahramani, 2016; Gal et al., 2017; Mad-dox et al., 2019; Dusenberry et al., 2020). Finally, there are techniques for improving the calibration of a single neural network during training (Thulasidasan et al., 2019; Mukhotiet al., 2020; Liu et al., 2020).<br />
<br />
2.Theoretical analysis of calibration.<br />
<br />
Kumar et al. (2019) show that continuous rescaling methods such as temperature scaling is less calibrated than reported, and proposed a method that combines temperature scaling and histogram binning. Gupta et al. (2020) study the relationship between calibration and other notions of uncertainty such as confidence intervals. Shabat et al. (2020); Jung et al. (2020) study the sample complexity of estimating the multicalibration error (group calibration). A related theoretical result to ours is (Liu et al., 2019) which shows that the calibration error of any classifier is upper bounded by its square root excess logistic loss over the Bayes classifier. This result can be translated to aO(√d/n)upper bound for well-specified logistic regression, whereas our main result implies Θ(d/n)calibration error in our high-dimensional limiting regime(with input distribution assumptions).<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50181Don't Just Blame Over-parametrization2021-11-13T09:02:36Z<p>T229yu: /* Motivation */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
<br />
<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, the conclusion is that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50168Don't Just Blame Over-parametrization2021-11-12T10:36:59Z<p>T229yu: /* Introduction */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Previous Work ==<br />
<br />
<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, we show that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50167Don't Just Blame Over-parametrization2021-11-12T10:36:33Z<p>T229yu: /* Motivation */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
== Previous Work ==<br />
<br />
<br />
<br />
== Motivation ==<br />
There is a typical question is that why does such over-confidence happen for vanilla trained models. One common understanding is that over-confidence is a result of over-parametrization, such as deep neural networks in (Mukhoti et al., 20'). However, so far it is unclear whether over-parametrization is the only reason, or whether there are other intrinsic reasons leading to over-confidence. In this paper, we show that over-confidence is not just a result of over-parametrization and is more inherent.<br />
<br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50166Don't Just Blame Over-parametrization2021-11-12T10:34:39Z<p>T229yu: </p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
== Previous Work ==<br />
<br />
<br />
<br />
== Motivation == <br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50165Don't Just Blame Over-parametrization2021-11-12T10:34:20Z<p>T229yu: </p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
== Previous Work ==<br />
<br />
<br />
== Previous Work ==<br />
<br />
<br />
== Motivation == <br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
== Critiques ==<br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50164Don't Just Blame Over-parametrization2021-11-12T10:32:57Z<p>T229yu: </p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
== Previous Work ==<br />
<br />
<br />
== Previous Work ==<br />
<br />
<br />
== Motivation == <br />
== Model Architecture == <br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.<br />
<br />
<br />
==Conclusion == <br />
<br />
==References==</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization&diff=50163Don't Just Blame Over-parametrization2021-11-12T10:29:57Z<p>T229yu: /* Introduction */</p>
<hr />
<div>== Presented by == <br />
<br />
Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu<br />
== Introduction ==<br />
''Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification'' is a paper from ICML 2021 written by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. <br />
<br />
Machine learning models such as deep neural networks have high accuracy. However, the predicted top probability (confidence) does not reflect the actual accuracy of the model, which tends to be '''over-confident'''. For example, a WideResNet 32 on CIFAR100 has on average a predicted top probability of 87%, while the actual test accuracy is only 72% in (Guo et al., 17'). To address this issue, more and more researchers work on improving the '''calibration''' of models, which can reduce the over-confidence and preserve (or even improve) the accuracy in (Ovadia et al., 19').<br />
<br />
== Experiments ==<br />
<br />
The authors conducted two experiments to test the theories: the first was based on simulation, and the second used the data CIFAR10. <br />
<br />
There are two activations used in the simulation: well-specified under-parametrized logistic regression as well as general convex ERM with the under-confident activation <math>\sigma_{underconf}</math>. The “calibration curves” were plotted for both activations: the x-axis is p, the y-axis is the average probability given the prediction.<br />
<br />
[[File:simulation.png|700px|thumb|center]]<br />
<br />
The figure above shows four main results: First, the logistic regression is over-confident at all <math>\kappa</math>. Second, over-confidence is more severe when <math>\kappa</math> increases, suggests the conclusion of the theory holds more broadly than its assumptions. Third, <math>\sigma_{underconf}</math> leads to under-confidence for <math>p \in (0.5, 0.51)</math>, which verifies Theorem 2 and Corollary 3. Finally, theoretical prediction closely matches the simulation, further confirms the theory.<br />
<br />
The generality of the theory beyond the Gaussian input assumption and the binary classification setting was further tested using dataset CIFAR10 by running multi-class logistic regression on the first five classes on it. The author performed logistic regression on two kinds of labels: true label and pseudo-label generated from the multi-class logistic (softmax) model. <br />
<br />
[[File:DJB_CIFAR10.png|700px|thumb|center]]<br />
<br />
The figure above indicates that the logistic regression is over-confident on both labels, where the over-confidence is more severe on the pseudo-labels than the true labels. This suggests the result that logistic regression is inherently over-confident may hold more broadly for other under-parametrized problems without strong assumptions on the input distribution, or even when the labels are not necessarily realizable by the model.</div>T229yuhttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50107stat441F212021-11-10T03:59:14Z<p>T229yu: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 16 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || || || ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf] || ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || ||<br />
|-<br />
|Week of Nov 22 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || || || ||<br />
|-<br />
|Week of Nov 22 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Another Look At Distance-Weighted Discrimination || [http://users.stat.umn.edu/~wang3660/papers/kerndwd.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Another_look_at_distance-weighted_discrimination Summary] ||<br />
|-<br />
|Week of Nov 22 || Lingshan Wang, Yifan Li, Ziyi Liu || || Understanding Convolutional Neural Networks for Text Classification || [https://arxiv.org/pdf/1809.08037.pdf Paper] || ||<br />
|-</div>T229yu