multi-Task Feature Learning

From statwiki
Jump to navigation Jump to search

Introduction

It has been both empirically as well as theoretically shown that learning multiple related tasks simultaneously often significantly improves performance as compared to learning each task independently. Learning multiple related tasks simultaneously is especially beneficial when we only have a few data per task; and this benefit comes from pooling together data across many related tasks. One way that tasks can be related to each other is that some tasks share a common underlying representation; for example, people make product choices (e.g. of books, music CDs, etc.) using a common set of features describing these products. In this paper, the authors explored a way to learn a low-dimensional representation this is shared across multiple related tasks.

Learning sparse multi-task representations

Let [math]\displaystyle{ \,R }[/math] be the set of real numbers and [math]\displaystyle{ \,R_{+} (R_{++}) }[/math] the non-negative ones. Let T be the number of tasks and let the tasks as [math]\displaystyle{ \,N_T := {1,\dots,T} }[/math]. For each task [math]\displaystyle{ \,t \in N_T }[/math], we have [math]\displaystyle{ \,m }[/math] input/output examples [math]\displaystyle{ \,(x_{t1}; y_{t1}),\dots,(x_{tm}; y_{tm}) \in R^d \times R }[/math]. The goal is then to estimate [math]\displaystyle{ \,T }[/math] functions [math]\displaystyle{ \,f_t : R^d \mapsto R, t \in N_T }[/math], that approximate well the training data and are also statistically predictive for new data.