uncovering Shared Structures in Multiclass Classification

From statwiki
Revision as of 12:38, 22 November 2010 by Y24Sun (talk | contribs) (→‎Formulation)
Jump to navigation Jump to search

Introduction

In their paper Uncovering Shared Structures in Multiclass Classification, the authors Amit et al. wrote about how hidden structure can be utilized to improve the accuracy in multiclass classification. This notion is often called learning-to-learn or interclass transfer (Thrun, 1996).

The uncovering of such hidden structure was accomplished by a mechanism that learns the underlying characteristics that are shared between the target classes. The benefits of finding such common characteristics was demonstrated in the context of large margin multiclass linear classifiers.

Accurate classification of an instance when there is a large number of target classes is a major challenge in many areas such as object recognition, face identification, textual topic classification, and phoneme recognition.

Usually, when there is a large number of classes, the classes are strongly related to each other and have some underlying common characteristics. As a simple example, even though there are many different kinds of mammals, many of them share common characteristics such as having a striped texture. If such true underlying characteristics that are common to the many different classes can be found, then the effective complexity of the multiclass problem can be significantly reduced.

Simultaneously learning the underlying structure shared by classes is a challenging optimization task. The usual goal of past heuristics was to extract powerful non-linear hidden characteristics. However, this goal was usually not convex, and it often resulted in local minima. In this paper, the authors instead modeled the shared characteristics between classes as linear transformations of the input space, so their model was a linear mapping of the shared features of classes that was followed by a multiclass linear classifier. Their model was not only learned in a convex manner, it also significantly improved the accuracy of multiclass linear classifiers.


Formulation

The ultimate goal of multiclass classification is to learn a mapping [math]\displaystyle{ \,H : \mathcal{X} \mapsto \mathcal{Y} }[/math] from instances in [math]\displaystyle{ \,\mathcal{X} }[/math] to labels in [math]\displaystyle{ \,\mathcal{Y} = \{1, \dots , k\} }[/math].

In this paper, the authors considered a linear classifiers over [math]\displaystyle{ \,\mathcal{X} = R^{n} }[/math] that are parametrized by a vector of weights [math]\displaystyle{ \,W_y \in R^{n} }[/math] for each class [math]\displaystyle{ \,y \in \mathcal{Y} }[/math]. These linear models were of the form:

[math]\displaystyle{ \,H_W(x) = \arg\max_{y \in \mathcal{Y}} W_{y}^{t} x \;\;\; (1) }[/math]