Difference between revisions of "IPBoost"

From statwiki
Jump to: navigation, search
(Presented by)
Line 3: Line 3:
  
 
== Introduction ==  
 
== Introduction ==  
 +
Boosting (see Wikipedia) is an important (and by now standard) technique in classification to combine several “low accuracy” learners, so-called base learners, into a “high accuracy” learner, a so-called boosted learner. Pioneered by the AdaBoost approach of [FS], in recent decades there has been extensive work on boosting procedures and analyses of their limitations. In a nutshell, boosting procedures are (typically) iterative schemes that roughly work as follows:
 +
 +
for <math> t= 1, \cdots, T </math> do the following:
 +
 +
# Train a learner <math> \mu_t</math> from a given class of base learners on the data distribution <math> \mathcal D_t</math>
 +
# Evaluate performance of <math> \mu_t</math> by computing its loss.
 +
# Push weight of the data distribution <math> \mathcal D_t</math> towards misclassified examples leading to <math> \mathcal D_{t+1}</math>
 +
.
 +
Finally, the learners are combined by some form of voting (e.g., soft or hard voting, averaging, thresholding). A close inspection of most (but not all) boosting procedures reveals that they solve an underlying convex optimization problem over a convex loss function by means of coordinate gradient descent. Boosting schemes of this type are often referred to as convex potential boosters. These procedures can achieve exceptional performance on many data sets if the data is correctly labeled. In fact, in theory, provided the class of base learners is rich enough, a perfect strong learner can be constructed that has accuracy 1, however clearly such a learner might not necessarily generalize well. Boosted learners can generate quite some complicated decision boundaries, much more complicated than that of the base learners. Here is an example from Paul van der Laken’s blog / Extreme gradient boosting gif by Ryan Holbrook. Here data is generated online according to some process with optimal decision boundary represented by the dotted line and XGBoost was used learn a classifier:
 +
 +
  
 
== Motivation ==  
 
== Motivation ==  
Line 16: Line 27:
  
 
===Algorithm===
 
===Algorithm===
<math>
 
\usepackage{algorithm}
 
\begin{algorithm}
 
\caption{<your caption for this algorithm>}
 
\label{<your label for references later in your document>}
 
\begin{algorithmic}
 
<algorithmic environment>
 
\end{algorithmic}
 
\end{algorithm}
 
  
</math>
 
  
 
== Results and Performance ==  
 
== Results and Performance ==  

Revision as of 20:24, 15 November 2020

Presented by

Casey De Vera, Solaiman Jawad

Introduction

Boosting (see Wikipedia) is an important (and by now standard) technique in classification to combine several “low accuracy” learners, so-called base learners, into a “high accuracy” learner, a so-called boosted learner. Pioneered by the AdaBoost approach of [FS], in recent decades there has been extensive work on boosting procedures and analyses of their limitations. In a nutshell, boosting procedures are (typically) iterative schemes that roughly work as follows:

for [math] t= 1, \cdots, T [/math] do the following:

  1. Train a learner [math] \mu_t[/math] from a given class of base learners on the data distribution [math] \mathcal D_t[/math]
  2. Evaluate performance of [math] \mu_t[/math] by computing its loss.
  3. Push weight of the data distribution [math] \mathcal D_t[/math] towards misclassified examples leading to [math] \mathcal D_{t+1}[/math]

. Finally, the learners are combined by some form of voting (e.g., soft or hard voting, averaging, thresholding). A close inspection of most (but not all) boosting procedures reveals that they solve an underlying convex optimization problem over a convex loss function by means of coordinate gradient descent. Boosting schemes of this type are often referred to as convex potential boosters. These procedures can achieve exceptional performance on many data sets if the data is correctly labeled. In fact, in theory, provided the class of base learners is rich enough, a perfect strong learner can be constructed that has accuracy 1, however clearly such a learner might not necessarily generalize well. Boosted learners can generate quite some complicated decision boundaries, much more complicated than that of the base learners. Here is an example from Paul van der Laken’s blog / Extreme gradient boosting gif by Ryan Holbrook. Here data is generated online according to some process with optimal decision boundary represented by the dotted line and XGBoost was used learn a classifier:


Motivation

IPBoost: Boosting via Integer Programming

Integer Program Formulation

Solution using branch and price procedure

Algorithm

Results and Performance

References