Task Understanding from Confushing Multitask Data

From statwiki
Revision as of 18:05, 15 November 2020 by Q8song (talk | contribs) (→‎Results)
Jump to navigation Jump to search

Task Understanding from Confusing Multi-task Data

Presented By aslkdfj;awekrf

1. Introduction

hialll

Hello

[math]\displaystyle{ \begin{align*} e & = \pi = \sqrt{g} \end{align*} }[/math]


2. Related Work

How does formatting of paragraphs work? hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi

[math]\displaystyle{ \begin{align*} e & = \text{Hellow}\\ & = \dfrac{123}{4}\\ \end{align*} }[/math]


[math]\displaystyle{ \begin{align*} h+1 & = \dfrac{abc}{\text{def}}\\ & = \dfrac{123}{4}\\ \end{align*} }[/math]


Experiment

Setup

3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tasks.

Function Regression: The function regression data comes in the form of [math]\displaystyle{ (x_i,y_i),i=1,...,m }[/math] pairs. However, unlike typical regression problems, there are multiple [math]\displaystyle{ f_j(x),j=1,...,n }[/math] mapping functions, so the goal is to recover both the mapping functions [math]\displaystyle{ f_j }[/math] as well as determine which mapping function corresponds to each of the [math]\displaystyle{ m }[/math] observations.

Colorful-MNIST: The first image classification data set consists of the MNIST digit data that has been colored. Each observation in this modified set consists of a colored image ([math]\displaystyle{ x_i }[/math]) and either the color, or the digit it represents ([math]\displaystyle{ y_i }[/math]). The goal is to recover the classification task ("color" or "digit") for each observation and construct the 2 classifiers for both tasks.

Kaggle Fashion Product: This data set has more observations than the "colored-MNIST" data and consists of pictures labelled with either the “Gender”, “Category”, and “Color” of the clothing item.

Use of Pre-Trained CNN Feature Layers

In the Kaggle Fashion Product experiment, each of the 3 classification algorithms [math]\displaystyle{ f_j }[/math] consist of fully-connected layers that have been attached to feature-identifying layers from pre-trained Convolutional Neural Networks.

Metrics of Confusing Supervised Learning

There are two measures of accuracy used to evaluate and compare CSL to other methods, corresponding respectively to the accuracy of the task labelling and the accuracy of the learned mapping function.

Label Assignment Accuracy: [math]\displaystyle{ \alpha_T(j) }[/math] is the average number of times the learned task-assignment function [math]\displaystyle{ h }[/math] agrees with the task-assignment ability of humans [math]\displaystyle{ \tilde h }[/math] on whether each observation in the data "is" or "is not" in task [math]\displaystyle{ j }[/math].

$$ \alpha_T(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m I[h(x_i,y_i;f_k),\tilde h(x_i,y_i;f_j)]$$

The max over [math]\displaystyle{ k }[/math] is taken because we need to determine which learned task corresponds to which ground-truth task.

Mapping Function Accuracy: [math]\displaystyle{ \alpha_T(j) }[/math] again chooses [math]\displaystyle{ f_k }[/math], the learned mapping function that is closest to the ground-truth of task [math]\displaystyle{ j }[/math], and measures its average absolute accuracy compared to the ground-truth of task [math]\displaystyle{ j }[/math], [math]\displaystyle{ f_j }[/math], across all [math]\displaystyle{ m }[/math] observations.

$$ \alpha_L(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m 1-\dfrac{|g_k(x_i)-f_j(x_i)|}{|f_j(x_i)|}$$

Results

Given confusing data, the CSL performs better than traditional supervised learning methods, Pseudo-Label(Lee, 2013), and SMiLE(Tan et al., 2017). This is demonstrated by CSL's [math]\displaystyle{ \alpha_L }[/math] scores of around 95%, compared to [math]\displaystyle{ \alpha_L }[/math] scores of under 50% for the other methods. This supports the assertion that traditional methods only learn the means of all the ground-truth mapping functions when presented with confusing data.

Application of Multi-label Learning