http://wiki.math.uwaterloo.ca/statwiki/api.php?action=feedcontributions&user=Wlchong&feedformat=atomstatwiki - User contributions [US]2023-10-03T01:00:03ZUser contributionsMediaWiki 1.28.3http://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51237F21-STAT 441/841 CM 763-Proposal2021-12-21T19:23:28Z<p>Wlchong: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-based classification of lung cancers pathological images using convolutional neural networks<br />
<br />
In this project, we explore the classification problem of lung cancer pathological images. The input images are from three categories of tumor types (LUAD, LUSD, and MESO), and the images have been split into patches. We will follow the paper ''Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification''.<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Data Poison Attacks<br />
<br />
Description: Attempting to create a successful data poisoning attack<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Popularity of Shelter Pet Photo Prediction using Varied ML Techniques<br />
<br />
Description: In this Kaggle competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=F21-STAT_441/841_CM_763-Proposal&diff=51231F21-STAT 441/841 CM 763-Proposal2021-12-14T23:00:30Z<p>Wlchong: </p>
<hr />
<div>Use this format (Don’t remove Project 0)<br />
<br />
Project # 0 Group members:<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Last name, First name<br />
<br />
Title: Making a String Telephone<br />
<br />
Description: We use paper cups to make a string phone and talk with friends while learning about sound waves with this science project. (Explain your project in one or two paragraphs).<br />
<br />
--------------------------------------------------------------------<br />
Project # 1 Group members:<br />
<br />
Feng, Jared<br />
<br />
Huang, Xipeng<br />
<br />
Xu, Mingwei<br />
<br />
Yu, Tingzhou<br />
<br />
Title: Patch-Based Convolutional Neural Network for Cancers Classification<br />
<br />
Description: In this project, we consider classifying three classes (tumor types) of cancers based on pathological data. We will follow the paper ''Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification''.<br />
--------------------------------------------------------------------<br />
Project # 2 Group members:<br />
<br />
Anderson, Eric<br />
<br />
Wang, Chengzhi<br />
<br />
Zhong, Kai<br />
<br />
Zhou, Yi Jing<br />
<br />
Title: Data Poison Attacks<br />
<br />
Description: Attempting to create a successful data poisoning attack<br />
<br />
--------------------------------------------------------------------<br />
Project # 3 Group members:<br />
<br />
Chopra, Kanika<br />
<br />
Rajcoomar, Yush<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Title: Cancer Classification<br />
<br />
Description: We will be classifying three tumour types based on pathological data. <br />
<br />
--------------------------------------------------------------------<br />
Project # 4 Group members:<br />
<br />
Li, Shao Zhong<br />
<br />
Kerr, Hannah <br />
<br />
Wong, Ann Gie<br />
<br />
Title: Predicting "Pawpularity" of Pets with Image Regression<br />
<br />
Description: Analyze raw images and metadata to predict the “Pawpularity” of pet photos to help guide shelters and rescuers around the world improve the appeal of their pet profiles, so that more animals can get adopted and animals can find their "furever" home faster.<br />
<br />
--------------------------------------------------------------------<br />
Project # 5 Group members:<br />
<br />
Chin, Jessie Man Wai<br />
<br />
Ooi, Yi Lin<br />
<br />
Shi, Yaqi<br />
<br />
Ngew, Shwen Lyng<br />
<br />
Title: The Application of Classification in Accelerated Underwriting (Insurance)<br />
<br />
Description: Accelerated Underwriting (AUW), also called “express underwriting,” is a faster and easier process for people with good health condition to obtain life insurance. The traditional underwriting process is often painful for both customers and insurers. From the customer's perspective, they have to complete different types of questionnaires and provide different medical tests involving blood, urine, saliva and other medical results. Underwriters on the other hand have to manually go through every single policy to access the risk of each applicant. AUW allows people, who are deemed “healthy” to forgo medical exams. Since COVID-19, it has become a more concerning topic as traditional underwriting cannot be performed due to the stay-at-home order. However, this imposes a burden on the insurance company to better estimate the risk associated with less testing results. <br />
<br />
This is where data science kicks in. With different classification methods, we can address the underwriting process’ five pain points: labor, speed, efficiency, pricing and mortality. This allows us to better estimate the risk and classify the clients for whether they are eligible for accelerated underwriting. For the final project, we use the data from one of the leading US insurers to analyze how we can classify our clients for AUW using the method of classification. We will be using factors such as health data, medical history, family history as well as insurance history to determine the eligibility.<br />
<br />
--------------------------------------------------------------------<br />
Project # 6 Group members:<br />
<br />
Wang, Carolyn<br />
<br />
Cyrenne, Ethan<br />
<br />
Nguyen, Dieu Hoa<br />
<br />
Sin, Mary Jane<br />
<br />
Title: Pawpularity (PetFinder Kaggle Competition)<br />
<br />
Description: Using images and metadata on the images to predict the popularity of pet photos, which is calculated based on page view statistics and other metrics from the PetFinder website.<br />
<br />
--------------------------------------------------------------------<br />
Project # 7 Group members:<br />
<br />
Bhattacharya, Vaibhav<br />
<br />
Chatoor, Amanda<br />
<br />
Prathap Das, Sutej<br />
<br />
Title: PetFinder.my - Pawpularity Contest [https://www.kaggle.com/c/petfinder-pawpularity-score/overview]<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos. We'll train and test our model on PetFinder.my's thousands of pet profiles.<br />
<br />
--------------------------------------------------------------------<br />
Project # 8 Group members:<br />
<br />
Yan, Xin<br />
<br />
Duan, Yishu<br />
<br />
Di, Xibei<br />
<br />
Title: The application of classification on company bankruptcy prediction<br />
<br />
Description: If a company goes bankrupt, all its employees will lose their jobs, and it is hard for them to find another suitable job in a short period. For the individual, the employee who loses the job due to bankruptcy will have no income for a period of time. This may lead to several negative consequences: increased homelessness as people do not have enough money to cover living expenses and increased crime rates as poverty increases. For the economy, if many companies go bankrupt at the same time, a huge number of employees will lose jobs, leading to a higher unemployment rate. This may cause a series of negative impact on the economy: loss of government tax revenue since the unemployed has no income and they do not need to pay the income taxes and increased inequality in the income distribution. <br />
<br />
Therefore, it can be seen that company bankruptcy negatively influences the individual, government, society, and the economy, this makes the prediction on company bankruptcy extremely essential. The purpose of the project is to predict whether a company will go bankrupt.<br />
--------------------------------------------------------------------<br />
Project # 9 Group members:<br />
<br />
Loke, Chun Waan<br />
<br />
Chong, Peter<br />
<br />
Osmond, Clarice<br />
<br />
Li, Zhilong<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: In this competition, we will analyze raw images and metadata to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 10 Group members:<br />
<br />
O'Farrell, Ethan<br />
<br />
D'Astous, Justin<br />
<br />
Hamed, Waqas<br />
<br />
Vladusic, Stefan<br />
<br />
Title: Pawpularity (Kaggle)<br />
<br />
Description: Predicting the popularity of animal photos based on photo metadata<br />
--------------------------------------------------------------------<br />
Project # 11 Group members:<br />
<br />
JunBin, Pan<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 12 Group members:<br />
<br />
Kar Lok, Ng<br />
<br />
Muhan (Iris), Li<br />
<br />
Wu, Mingze<br />
<br />
Title: NFL Health & Safety - Helmet Assignment competition (Kaggle Competition)<br />
<br />
Description: Assigning players to the helmet in a given footage of head collision in football play.<br />
--------------------------------------------------------------------<br />
Project # 13 Group members:<br />
<br />
Livochka, Anastasiia<br />
<br />
Wong, Cassandra<br />
<br />
Evans, David<br />
<br />
Yalsavar, Maryam<br />
<br />
Title: TBD<br />
<br />
Description: TBD<br />
--------------------------------------------------------------------<br />
Project # 14 Group Members:<br />
<br />
Zeng, Mingde<br />
<br />
Lin, Xiaoyu<br />
<br />
Fan, Joshua<br />
<br />
Rao, Chen Min<br />
<br />
Title: Toxic Comment Classification, Kaggle<br />
<br />
Description: Using Wikipedia comments labeled for toxicity to train a model that detects toxicity in comments.<br />
--------------------------------------------------------------------<br />
Project # 15 Group Members:<br />
<br />
Huang, Yuying<br />
<br />
Anugu, Ankitha<br />
<br />
Chen, Yushan<br />
<br />
Title: Implementation of the classification task between crop and weeds<br />
<br />
Description: Our work will be based on the paper ''Crop and Weeds Classification for Precision Agriculture using Context-Independent Pixel-Wise Segmentation''.<br />
--------------------------------------------------------------------<br />
Project # 16 Group Members:<br />
<br />
Wang, Lingshan<br />
<br />
Li, Yifan<br />
<br />
Liu, Ziyi<br />
<br />
Title: Implement and Improve CNN in Multi-Class Text Classification<br />
<br />
Description: We are going to apply Bidirectional Encoder Representations from Transformers (BERT) to classify real-world data (application to build an efficient case study interview materials classifier) and improve it algorithm-wise in the context of text classification, being supported with real-world data set. With the implementation of BERT, it allows us to further analyze the efficiency and practicality of the algorithm when dealing with imbalanced dataset in the data input level and modelling level.<br />
The dataset is composed of case study HTML files containing case information that can be classified into multiple industry categories. We will implement a multi-class classification to break down the information contained in each case material into some pre-determined subcategories (eg, behavior questions, consulting questions, questions for new business/market entry, etc.). We will attempt to process the complicated data into several data types(e.g. HTML, JSON, pandas data frames, etc.) and choose the most efficient raw data processing logic based on runtime and algorithm optimization.<br />
--------------------------------------------------------------------<br />
Project # 17 Group members:<br />
<br />
Malhi, Dilmeet<br />
<br />
Joshi, Vansh<br />
<br />
Syamala, Aavinash <br />
<br />
Islam, Sohan<br />
<br />
Title: Kaggle project: PetFinder.my - Pawpularity Contest<br />
<br />
Description: In this competition, we will analyze raw images provided by PetFinder.my to predict the “Pawpularity” of pet photos.<br />
--------------------------------------------------------------------<br />
<br />
Project # 18 Group members:<br />
<br />
Yuwei, Liu<br />
<br />
Daniel, Mao<br />
<br />
Title: Sartorius - Cell Instance Segmentation (Kaggle) [https://www.kaggle.com/c/sartorius-cell-instance-segmentation]<br />
<br />
Description: Detect single neuronal cells in microscopy images<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project #19 Group members:<br />
<br />
Samuel, Senko<br />
<br />
Tyler, Verhaar<br />
<br />
Zhang, Bowen<br />
<br />
Title: NBA Game Prediction<br />
<br />
Description: We will build a win/loss classifier for NBA games using player and game data and also incorporating alternative data (ex. sports betting data).<br />
<br />
-------------------------------------------------------------------<br />
<br />
Project #20 Group members:<br />
<br />
Mitrache, Christian<br />
<br />
Renggli, Aaron<br />
<br />
Saini, Jessica<br />
<br />
Mossman, Alexandra<br />
<br />
Title: Classification and Deep Learning for Healthcare Provider Fraud Detection Analysis<br />
<br />
Description: TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 21 Group members:<br />
<br />
Wang, Kun<br />
<br />
Title: TBD<br />
<br />
Description : TBD<br />
<br />
--------------------------------------------------------------------<br />
<br />
Project # 22 Group members:<br />
<br />
Guray, Egemen<br />
<br />
Title: Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network<br />
<br />
Description : I will build a prediction system to predict road signs in the German Traffic Sign Dataset using CNN.<br />
--------------------------------------------------------------------<br />
<br />
Project # 23 Group members:<br />
<br />
Bsodjahi<br />
<br />
Title: Modeling Pseudomonas aeruginosa bacteria state through its genes expression activity<br />
<br />
Description : Label Pseudomonas aeruginosa gene expression data through unsupervised learning (eg., EM algorithm) and then model the bacterial state as function of its genes expression</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=51160stat441F212021-11-29T21:31:12Z<p>Wlchong: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks || [https://arxiv.org/pdf/1804.00792.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Poison_Frogs_Neural_Networks Summary] ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || Deep Residual Learning for Image Recognition || [https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || XGBoost: A Scalable Tree Boosting System || [https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost Summary] ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || Robust Imitation Learning from Noisy Demonstrations || [http://proceedings.mlr.press/v130/tangkaratt21a/tangkaratt21a.pdf Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Robust_Imitation_Learning_from_Noisy_Demonstrations Summary] ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || Traffic Sign Recognition System (TSRS): SVM and Convolutional Neural Network || [https://www.researchgate.net/publication/344399165_Traffic_Sign_Recognition_System_TSRS_SVM_and_Convolutional_Neural_Network Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Traffic_Sign_Recognition_System_(TSRS):_SVM_and_Convolutional_Neural_Network Summary] ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease ||[https://www.mdpi.com/2076-3425/11/2/150/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Bayesian_Network_as_a_Decision_Tool_for_Predicting_ALS_Disease Summary]||<br />
|-<br />
|Week of Nov 29 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=A_Game_Theoretic_Approach_to_Class-wise_Selective_Rationalization#How_does_CAR_work_intuitively Summary]||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree Summary]||<br />
|-<br />
|Week of Nov 29 ||Christian Mitrache, Alexandra Mossman, Jessica Saini, Aaron Renggli|| || U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging|| [https://proceedings.neurips.cc/paper/2019/file/57bafb2c2dfeefba931bb03a835b1fa9-Paper.pdf?fbclid=IwAR1dZpx9vU1pSPTSm_nwk6uBU7TYJ2HNTrsqjaH-9ZycE_PFpFjJoHg1zhQ]||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=U-Time:A_Fully_Convolutional_Network_for_Time_Series_Segmentation_Applied_to_Sleep_Staging_Summary]||<br />
|-<br />
|Week of Nov 29 ||Junbin Pan|| || Wide & Deep Learning for Recommender Systems || [https://arxiv.org/pdf/1606.07792v1.pdf Paper] || [Summary]||</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50767XGBoost2021-11-24T04:04:50Z<p>Wlchong: /* Distributed Experiment */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
A subset (n = 1M) of Higgs Boson dataset was used for classification. The results are shown in Table 3.<br />
<br />
[[File: classification.png|500px|thumb|center]]<br />
<br />
* R’s GBM creates a tree fast by using greedy approach that only expands one branch of a tree but with the cost of lower accuracy<br />
* XGBoost and scikit-learn have better performance than R’s GBM<br />
* XGBoost runs more than 10x faster than scikit-learn in learning a full tree<br />
* Column subsamples give slightly worse performance possibly due to a few important features in this dataset.<br />
<br />
=== Learning to Rank ===<br />
The results in comparing XGBoost with pGBRT are shown in Table 4.<br />
<br />
[[File: rank.png|500px|thumb|center]]<br />
<br />
* XGBoost runs exact greedy algorithm while pGBRT only supports an approximate algorithm<br />
* XGBoost runs faster<br />
* Subsampling column reduces running time and gives higher performance possibly due to preventing overfitting<br />
<br />
=== Out-of-core Experiment ===<br />
An extremely large Criteo dataset was used to evaluate the out-of-core setting. The results are shown in the figure below.<br />
<br />
[[File: outofcore.png|350px|thumb|center]]<br />
<br />
* Compression speed up the computation by a factor of 3<br />
* Sharding into 2 disks further gives 2x speed<br />
<br />
=== Distributed Experiment ===<br />
YARN cluster on EC2 with m3.2xlarge machines was used to evaluate the distributed setting. XGBoost was compared with Spark MLLib and H20.<br />
<br />
* XGBoost can switch to out-of-core setting when the memory runs out, while the baseline systems need to store the data in RAM<br />
* XGBoost runs faster than the baseline systems<br />
* The baseline systems are only able to handle a subset of the data with the given resources while XGBoost was able to scale smoothly to all 1.7B rows<br />
* XGBoost’s performance scales linearly with more machines<br />
* XGBoost is able to handle all 1.7B data with only 4 machines<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50766XGBoost2021-11-24T03:44:40Z<p>Wlchong: /* Out-of-core Experiment */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
A subset (n = 1M) of Higgs Boson dataset was used for classification. The results are shown in Table 3.<br />
<br />
[[File: classification.png|500px|thumb|center]]<br />
<br />
* R’s GBM creates a tree fast by using greedy approach that only expands one branch of a tree but with the cost of lower accuracy<br />
* XGBoost and scikit-learn have better performance than R’s GBM<br />
* XGBoost runs more than 10x faster than scikit-learn in learning a full tree<br />
* Column subsamples give slightly worse performance possibly due to a few important features in this dataset.<br />
<br />
=== Learning to Rank ===<br />
The results in comparing XGBoost with pGBRT are shown in Table 4.<br />
<br />
[[File: rank.png|500px|thumb|center]]<br />
<br />
* XGBoost runs exact greedy algorithm while pGBRT only supports an approximate algorithm<br />
* XGBoost runs faster<br />
* Subsampling column reduces running time and gives higher performance possibly due to preventing overfitting<br />
<br />
=== Out-of-core Experiment ===<br />
An extremely large Criteo dataset was used to evaluate the out-of-core setting. The results are shown in the figure below.<br />
<br />
[[File: outofcore.png|350px|thumb|center]]<br />
<br />
* Compression speed up the computation by a factor of 3<br />
* Sharding into 2 disks further gives 2x speed<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50765XGBoost2021-11-24T03:43:32Z<p>Wlchong: /* Classification */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
A subset (n = 1M) of Higgs Boson dataset was used for classification. The results are shown in Table 3.<br />
<br />
[[File: classification.png|500px|thumb|center]]<br />
<br />
* R’s GBM creates a tree fast by using greedy approach that only expands one branch of a tree but with the cost of lower accuracy<br />
* XGBoost and scikit-learn have better performance than R’s GBM<br />
* XGBoost runs more than 10x faster than scikit-learn in learning a full tree<br />
* Column subsamples give slightly worse performance possibly due to a few important features in this dataset.<br />
<br />
=== Learning to Rank ===<br />
The results in comparing XGBoost with pGBRT are shown in Table 4.<br />
<br />
[[File: rank.png|500px|thumb|center]]<br />
<br />
* XGBoost runs exact greedy algorithm while pGBRT only supports an approximate algorithm<br />
* XGBoost runs faster<br />
* Subsampling column reduces running time and gives higher performance possibly due to preventing overfitting<br />
<br />
=== Out-of-core Experiment ===<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:outofcore.png&diff=50764File:outofcore.png2021-11-24T03:43:05Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50763XGBoost2021-11-24T03:36:47Z<p>Wlchong: /* Learning to Rank */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
A subset (n = 1M) of Higgs Boson dataset was used for classification. The results are shown in Table 3.<br />
<br />
[[File: classification.png|500px|thumb|center]]<br />
<br />
* R’s GBM creates a tree fast by using greedy approach that only expands one branch of a tree but with the cost of lower accuracy<br />
* XGBoost and scikit-learn have better performance than R’s GBM<br />
* XGBoost runs more than 10x faster than scikit-learn in learning a full tree<br />
* Column subsamples gives slightly worse performance possibly due to a few important features in this dataset.<br />
<br />
=== Learning to Rank ===<br />
The results in comparing XGBoost with pGBRT are shown in Table 4.<br />
<br />
[[File: rank.png|500px|thumb|center]]<br />
<br />
* XGBoost runs exact greedy algorithm while pGBRT only supports an approximate algorithm<br />
* XGBoost runs faster<br />
* Subsampling column reduces running time and gives higher performance possibly due to preventing overfitting<br />
<br />
=== Out-of-core Experiment ===<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:rank.png&diff=50762File:rank.png2021-11-24T03:32:17Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50761XGBoost2021-11-24T03:29:28Z<p>Wlchong: /* Classification */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
A subset (n = 1M) of Higgs Boson dataset was used for classification. The results are shown in Table 3.<br />
<br />
[[File: classification.png|500px|thumb|center]]<br />
<br />
* R’s GBM creates a tree fast by using greedy approach that only expands one branch of a tree but with the cost of lower accuracy<br />
* XGBoost and scikit-learn have better performance than R’s GBM<br />
* XGBoost runs more than 10x faster than scikit-learn in learning a full tree<br />
* Column subsamples gives slightly worse performance possibly due to a few important features in this dataset.<br />
<br />
=== Learning to Rank ===<br />
<br />
=== Out-of-core Experiment ===<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:classification.png&diff=50760File:classification.png2021-11-24T03:26:17Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50759XGBoost2021-11-24T03:15:11Z<p>Wlchong: /* Dataset and Setup */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
A summary of the 4 datasets used in the experiments is in table 2.<br />
<br />
[[File: table2data.png|500px|thumb|center]]<br />
<br />
The first and last datasets were used specifically to evaluate the impact of sparsity-aware algorithm and the scaling property of the system in the out-of-core and the distributed settings respectively. In all of the experiments, common settings such as maximum depth equal to 8, shrinkage equal to 0.1 and no column subsampling are applied.<br />
<br />
=== Classification ===<br />
<br />
=== Learning to Rank ===<br />
<br />
=== Out-of-core Experiment ===<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:table2data.png&diff=50758File:table2data.png2021-11-24T03:08:07Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50757XGBoost2021-11-24T02:58:06Z<p>Wlchong: /* End To End Evaluations */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
=== System Implementation ===<br />
XGBoost is a portable and reusable open-source package. It supports:<br />
* various weighted classification<br />
* various objective functions (rank & user defined)<br />
* popular languages (python, R, Julia)<br />
* data science pipelines (scikit-learn)<br />
* big data stacks (Flink, Spark)<br />
* cloud platform (Tianchi8)<br />
<br />
=== Dataset and Setup ===<br />
<br />
=== Classification ===<br />
<br />
=== Learning to Rank ===<br />
<br />
=== Out-of-core Experiment ===<br />
<br />
=== Distributed Experiment ===<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50753XGBoost2021-11-23T23:19:04Z<p>Wlchong: /* Cache-aware Access */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|400px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50752XGBoost2021-11-23T23:18:52Z<p>Wlchong: /* Cache-aware Access */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|40px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50751XGBoost2021-11-23T23:18:11Z<p>Wlchong: /* Column Block for Parallel Learning */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
The most time consuming part of tree learning is to get the data into sorted order. To overcome this problem, XGBoost stores the data in blocks where each block performs the sorting operation in parallel. The results are then merged together.<br />
<br />
# For the exact greedy algorithm, XGBoost stores the entire dataset in a single block<br />
#:* Original spase aware algorithm costs: <math>O(Kd∥x∥_0 log(n))</math><br />
#:* Tree boosting on block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(n))</math><br />
#:* Hence, block structure helps to save an additional <math>log(n)</math> factor<br />
# For approximate algorithms, XGBoost stores the dataset in multiple blocks<br />
#:* Original algorithm with binary search costs: <math>O(Kd∥x∥_0 log(q))</math><br />
#:* With block structure costs: <math>O(Kd∥x∥_0 + ∥x∥_0 log(B))</math><br />
#:* Hence, block structure helps to save an additional <math>log(q)</math> factor <br />
<br />
K = total number of trees<br /><br />
d = maximum depth of the tree<br /><br />
<math>∥x∥_0</math> = number of non-missing entries in the training data<br /><br />
n = number of rows in the dataset<br /><br />
q = number of proposal candidates in the dataset, usually between 32 to 100<br /><br />
B = maximum number of rows in each block<br />
<br />
[[File: columnblock.png|800px|thumb|center]]<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|500px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:columnblock.png&diff=50750File:columnblock.png2021-11-23T23:17:29Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50748XGBoost2021-11-23T21:51:33Z<p>Wlchong: /* Cache-aware Access */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose <math>2^{16}</math> as the block size<br />
<br />
[[File: blocksize.png|500px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50747XGBoost2021-11-23T21:49:18Z<p>Wlchong: /* Cache-aware Access */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose 2^16 as the block size<br />
<br />
[[File: blocksize.png|500px|thumb|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50746XGBoost2021-11-23T21:48:04Z<p>Wlchong: /* Cache-aware Access */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
<br />
=== Cache-aware Access ===<br />
The proposed block structure optimizes the computation complexity but requires indirect fetches of gradient statistics by row index. XGBoost optimizes the process by using the following methods.<br />
<br />
# For the exact greedy algorithm, XGBoost uses a cache-aware prefetching algorithm<br />
#:* It stores Gradient and Hessians in the cache to make calculations fast<br />
#:* It runs twice as fast as the naive method when the dataset is large<br />
# For approximate algorithms, XGBoost chooses a specific block size<br />
#:* Choosing a small block size results in inefficient parallelization<br />
#:* Choosing a large block size results in cache misses<br />
#:* Various choices of block size are compared and the results are shown in Figure 9<br />
#:* XGBoost chose 2^16 as the block size<br />
<br />
[[File: blocksize.png|center]]<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=File:blocksize.png&diff=50745File:blocksize.png2021-11-23T21:47:44Z<p>Wlchong: </p>
<hr />
<div></div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50741XGBoost2021-11-23T20:41:29Z<p>Wlchong: /* System Design */</p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
=== Column Block for Parallel Learning ===<br />
<br />
=== Cache-aware Access ===<br />
<br />
=== Blocks for Out-of-core Computation ===<br />
When the dataset is too large for the cache and main memory, XGBoost utilizes disk spaces as well. Since reading and writing data to the disks are slow, XGBoost optimizes the process by using the following two methods.<br />
<br />
# Block Compression<br />
#:* Blocks are compressed by columns<br />
#:* Although decompressing takes time, it is still faster than reading from the disks<br />
# Block Sharding<br />
#:* If multiple disks are available, data is split into those disks<br />
#:* When the CPU needs data, all the disks can be read at the same time<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50731XGBoost2021-11-23T20:05:36Z<p>Wlchong: </p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning (ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50729XGBoost2021-11-23T20:04:48Z<p>Wlchong: </p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective. <br />
<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011. <br />
<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning<br />
(ICML’13), volume 1, pages 436–444, 2013.<br />
<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient<br />
<br />
second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50727XGBoost2021-11-23T20:02:51Z<p>Wlchong: </p>
<hr />
<div>== Presented by == <br />
* Chun Waan Loke<br />
* Peter Chong<br />
* Clarice Osmond<br />
* Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==<br />
[1] R. Bekkerman. The present and the future of the kdd cup competition: an outsider’s perspective.<br />
[2] R. Bekkerman, M. Bilenko, and J. Langford. Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York, NY, USA, 2011.<br />
[3] J. Bennett and S. Lanning. The netflix prize. In Proceedings of the KDD Cup Workshop 2007, pages 3–6, New York, Aug. 2007.<br />
[4] L. Breiman. Random forests. Maching Learning, 45(1):5–32, Oct. 2001.<br />
[5] C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23–581, 2010.<br />
[6] O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine Learning Research - W & CP, 14:1–24, 2011.<br />
[7] T. Chen, H. Li, Q. Yang, and Y. Yu. General functional matrix factorization using gradient boosting. In Proceeding of 30th International Conference on Machine Learning<br />
(ICML’13), volume 1, pages 436–444, 2013.<br />
[8] T. Chen, S. Singh, B. Taskar, and C. Guestrin. Efficient<br />
second-order gradient boosting for conditional random fields. In Proceeding of 18th Artificial Intelligence and Statistics Conference (AISTATS’15), volume 1, 2015.<br />
[9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008.<br />
[10] J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001.<br />
[11] J. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.<br />
[12] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.<br />
[13] J. H. Friedman and B. E. Popescu. Importance sampled learning ensembles, 2003.<br />
[14] M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 58–66, 2001.<br />
[15] X. He, J. Pan, O. Jin, T. Xu, B. Liu, T. Xu, Y. Shi,<br />
A. Atallah, R. Herbrich, S. Bowers, and J. Q. n. Candela. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, ADKDD’14, 2014.<br />
[16] P. Li. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’10), pages 302–311, 2010.<br />
[17] P. Li, Q. Wu, and C. J. Burges. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20, pages 897–904. 2008.<br />
[18] X. Meng, J. Bradley, B. Yavuz, E. Sparks,<br />
S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh,<br />
M. Zaharia, and A. Talwalkar. MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17(34):1–7, 2016.<br />
[19] B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceeding of VLDB Endowment, 2(2):1426–1437, Aug. 2009.<br />
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,<br />
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,<br />
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,<br />
D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.<br />
[21] G. Ridgeway. Generalized Boosted Models: A guide to the gbm package.<br />
[22] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387–396. ACM, 2011.<br />
[23] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09.<br />
[24] Q. Zhang and W. Wang. A fast algorithm for approximate quantiles in high speed data streams. In Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007.<br />
[25] T. Zhang and R. Johnson. Learning nonlinear functions using regularized greedy forest. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 2014.</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50712XGBoost2021-11-23T18:29:24Z<p>Wlchong: </p>
<hr />
<div>== Presented by == <br />
Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li<br />
<br />
== Introduction ==<br />
<br />
== Tree Boosting In A Nutshell ==<br />
<br />
== Split Finding Algorithms ==<br />
<br />
== System Design ==<br />
<br />
== End To End Evaluations ==<br />
<br />
== Conclusion ==<br />
<br />
== References ==</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=XGBoost&diff=50710XGBoost2021-11-23T18:23:42Z<p>Wlchong: Created page with "== Presented by == Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li == Introduction =="</p>
<hr />
<div>== Presented by == <br />
Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li<br />
<br />
== Introduction ==</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50707stat441F212021-11-23T17:30:16Z<p>Wlchong: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || XGBoost: A Scalable Tree Boosting System || [https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf Paper] || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || || || ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || || || ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease || https://www.mdpi.com/2076-3425/11/2/150/pdf || ||<br />
|-<br />
|Week of Nov 22 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree Summary]||</div>Wlchonghttp://wiki.math.uwaterloo.ca/statwiki/index.php?title=stat441F21&diff=50706stat441F212021-11-23T16:52:29Z<p>Wlchong: /* Paper presentation */</p>
<hr />
<div><br />
<br />
== [[F20-STAT 441/841 CM 763-Proposal| Project Proposal ]] ==<br />
<br />
<!--[https://goo.gl/forms/apurag4dr9kSR76X2 Your feedback on presentations]--><br />
<br />
=Paper presentation=<br />
{| class="wikitable"<br />
<br />
{| border="1" cellpadding="3"<br />
|-<br />
|width="60pt"|Date<br />
|width="250pt"|Name <br />
|width="15pt"|Paper number <br />
|width="700pt"|Title<br />
|width="15pt"|Link to the paper<br />
|width="30pt"|Link to the summary<br />
|width="30pt"|Link to the video<br />
|-<br />
|Sep 15 (example)||Ri Wang || ||Sequence to sequence learning with neural networks.||[http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Going_Deeper_with_Convolutions Summary] || [https://youtu.be/JWozRg_X-Vg?list=PLehuLRPyt1HzXDemu7K4ETcF0Ld_B5adG&t=539]<br />
|-<br />
|Week of Nov 16 || Ali Ghodsi || || || || ||<br />
|-<br />
|Week of Nov 22 || Jared Feng, Xipeng Huang, Mingwei Xu, Tingzhou Yu|| || Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification || [http://proceedings.mlr.press/v139/bai21c/bai21c.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Don%27t_Just_Blame_Over-parametrization Summary] ||<br />
|-<br />
|Week of Nov 29 || Kanika Chopra, Yush Rajcoomar || || Automatic Bank Fraud Detection Using Support Vector Machines || [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.5804&rep=rep1&type=pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Automatic_Bank_Fraud_Detection_Using_Support_Vector_Machines Summary] ||<br />
|-<br />
|Week of Nov 22 || Zeng Mingde, Lin Xiaoyu, Fan Joshua, Rao Chen Min || || Do Vision Transformers See Like Convolutional Neural Networks? || [https://proceedings.neurips.cc/paper/2021/file/652cf38361a209088302ba2b8b7f51e0-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Do_Vision_Transformers_See_Like_CNN Summary] ||<br />
|-<br />
|Week of Nov 22 || Justin D'Astous, Waqas Hamed, Stefan Vladusic, Ethan O'Farrell || || A Probabilistic Approach to Neural Network Pruning || [http://proceedings.mlr.press/v139/qian21a/qian21a.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Summary_of_A_Probabilistic_Approach_to_Neural_Network_Pruning Summary] ||<br />
|-<br />
|Week of Nov 22 || Cassandra Wong, Anastasiia Livochka, Maryam Yalsavar, David Evans || || Patch-Based Convolutional Neural Network for Whole Slide Tissue Image Classification || [https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Hou_Patch-Based_Convolutional_Neural_CVPR_2016_paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Patch_Based_Convolutional_Neural_Network_for_Whole_Slide_Tissue_Image_Classification Summary] ||<br />
|-<br />
|Week of Nov 29 || Jessie Man Wai Chin, Yi Lin Ooi, Yaqi Shi, Shwen Lyng Ngew || || CatBoost: unbiased boosting with categorical features || [https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=CatBoost:_unbiased_boosting_with_categorical_features Summary] ||<br />
|-<br />
|Week of Nov 29 || Eric Anderson, Chengzhi Wang, Kai Zhong, YiJing Zhou || || || || ||<br />
|-<br />
|Week of Nov 29 || Ethan Cyrenne, Dieu Hoa Nguyen, Mary Jane Sin, Carolyn Wang || || || || ||<br />
|-<br />
|Week of Nov 29 || Bowen Zhang, Tyler Magnus Verhaar, Sam Senko || || Deep Double Descent: Where Bigger Models and More Data Hurt || [https://arxiv.org/pdf/1912.02292.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Double_Descent_Where_Bigger_Models_and_More_Data_Hurt Summary] ||<br />
|-<br />
|Week of Nov 29 || Chun Waan Loke, Peter Chong, Clarice Osmond, Zhilong Li|| || XGBoost: A Scalable Tree Boosting System || [https://arxiv.org/pdf/1603.02754.pdf Paper] || ||<br />
|-<br />
|Week of Nov 22 || Ann Gie Wong, Curtis Li, Hannah Kerr || || The Detection of Black Ice Accidents for Preventative Automated Vehicles Using Convolutional Neural Networks || [https://www.mdpi.com/2079-9292/9/12/2178/htm Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=The_Detection_of_Black_Ice_Accidents_Using_CNNs&fbclid=IwAR0K4YdnL_hdRnOktmJn8BI6-Ra3oitjJof0YwluZgUP1LVFHK5jyiBZkvQ Summary] ||<br />
|-<br />
|Week of Nov 22 || Yuwei Liu, Daniel Mao|| || Depthwise Convolution Is All You Need for Learning Multiple Visual Domains || [https://arxiv.org/abs/1902.00927 Paper] ||[https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Depthwise_Convolution_Is_All_You_Need_for_Learning_Multiple_Visual_Domains Summary] ||<br />
|-<br />
|Week of Nov 29 || Lingshan Wang, Yifan Li, Ziyi Liu || || Deep Learning for Extreme Multi-label Text Classification || [https://dl.acm.org/doi/pdf/10.1145/3077136.3080834 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Deep_Learning_for_Extreme_Multi-label_Text_Classification Summary]||<br />
|-<br />
|-<br />
|Week of Nov 29 || Kar Lok Ng, Muhan (Iris) Li || || || || ||<br />
|-<br />
|Week of Nov 29 ||Kun Wang || || Convolutional neural network for diagnosis of viral pneumonia and COVID-19 alike diseases|| [https://doi-org.proxy.lib.uwaterloo.ca/10.1111/exsy.12705 Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Convolutional_neural_network_for_diagnosis_of_viral_pneumonia_and_COVID-19_alike_diseases Summary] ||<br />
|-<br />
|Week of Nov 29 ||Egemen Guray || || || || ||<br />
|-<br />
|Week of Nov 29 ||Bsodjahi || || Bayesian Network as a Decision Tool for Predicting ALS Disease || https://www.mdpi.com/2076-3425/11/2/150/pdf || ||<br />
|-<br />
|Week of Nov 22 ||Xin Yan, Yishu Duan, Xibei Di || || Predicting Hurricane Trajectories Using a Recurrent Neural Network || [https://arxiv.org/pdf/1802.02548.pdf Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Predicting_Hurricane_Trajectories_Using_a_Recurrent_Neural_Network Summary]||<br />
|-<br />
|Week of Nov 29 ||Ankitha Anugu, Yushan Chen, Yuying Huang || || A Game Theoretic Approach to Class-wise Selective Rationalization || [https://arxiv.org/pdf/1910.12853.pdf Paper] || ||<br />
|-<br />
|Week of Nov 29 ||Aavinash Syamala, Dilmeet Malhi, Sohan Islam, Vansh Joshi || || Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree || [https://www.hindawi.com/journals/sp/2021/5560465/ Paper] || [https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Research_on_Multiple_Classification_Based_on_Improved_SVM_Algorithm_for_Balanced_Binary_Decision_Tree Summary]||</div>Wlchong