Task Understanding from Confushing Multitask Data: Difference between revisions
(16 intermediate revisions by the same user not shown) | |||
Line 44: | Line 44: | ||
3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tasks. | 3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tasks. | ||
'''Function Regression''': The function regression data comes in the form of <math>(x_i,y_i),i=1,...,m</math> pairs. However, unlike typical regression problems, there are multiple <math>f_j(x),j=1,...,n</math> mapping functions, so the goal is to recover both the mapping functions <math>f_j</math> as well as determine which mapping function corresponds to each of the <math>m</math> observations. | '''Function Regression''': The function regression data comes in the form of <math>(x_i,y_i),i=1,...,m</math> pairs. However, unlike typical regression problems, there are multiple <math>f_j(x),j=1,...,n</math> mapping functions, so the goal is to recover both the mapping functions <math>f_j</math> as well as determine which mapping function corresponds to each of the <math>m</math> observations. 3 scalar-valued, scalar-input functions that intersect at several points with each other have been chosen as the different tasks. | ||
'''Colorful-MNIST''': The first image classification data set consists of the MNIST digit data that has been colored. Each observation in this modified set consists of a colored image (<math>x_i</math>) and either the color, or the digit it represents (<math>y_i</math>). The goal is to recover the classification task ("color" or "digit") for each observation and construct the 2 classifiers for both tasks. | '''Colorful-MNIST''': The first image classification data set consists of the MNIST digit data that has been colored. Each observation in this modified set consists of a colored image (<math>x_i</math>) and either the color, or the digit it represents (<math>y_i</math>). The goal is to recover the classification task ("color" or "digit") for each observation and construct the 2 classifiers for both tasks. | ||
Line 56: | Line 56: | ||
==Metrics of Confusing Supervised Learning== | ==Metrics of Confusing Supervised Learning== | ||
There are two measures of accuracy used to evaluate and compare CSL to other methods. <math>\alpha_T(j)</math> | There are two measures of accuracy used to evaluate and compare CSL to other methods, corresponding respectively to the accuracy of the task labelling and the accuracy of the learned mapping function. | ||
'''Label Assignment Accuracy''': <math>\alpha_T(j)</math> is the average number of times the learned deconfusing function <math>h</math> agrees with the task-assignment ability of humans <math>\tilde h</math> on whether each observation in the data "is" or "is not" in task <math>j</math>. | |||
$$ \alpha_T(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m I[h(x_i,y_i;f_k),\tilde h(x_i,y_i;f_j)]$$ | $$ \alpha_T(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m I[h(x_i,y_i;f_k),\tilde h(x_i,y_i;f_j)]$$ | ||
The max over <math>k</math> is taken because we need to determine which learned task corresponds to which ground-truth task. | |||
'''Mapping Function Accuracy''': <math>\alpha_T(j)</math> again chooses <math>f_k</math>, the learned mapping function that is closest to the ground-truth of task <math>j</math>, and measures its average absolute accuracy compared to the ground-truth of task <math>j</math>, <math>f_j</math>, across all <math>m</math> observations. | |||
$$ \alpha_L(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m 1-\dfrac{|g_k(x_i)-f_j(x_i)|}{|f_j(x_i)|}$$ | |||
==Results== | ==Results== | ||
Given confusing data, the CSL performs better than traditional supervised learning methods, Pseudo-Label(Lee, 2013), and SMiLE(Tan et al., 2017). This is demonstrated by CSL's <math>\alpha_L</math> scores of around 95%, compared to <math>\alpha_L</math> scores of under 50% for the other methods. This supports the assertion that traditional methods only learn the means of all the ground-truth mapping functions when presented with confusing data. | |||
'''Function Regression''': In order to "correctly" partition the observations into the correct tasks, a 5-shot warm-up was used. | |||
'''Image Classification''': Visualizations created through Spectral embedding confirm the task labelling proficiency of the deconfusing neural network <math>h</math>. | |||
The classification and function prediction accuracy of CSL are comparable to supervised learning programs that have been given access to the ground-truth labels. | |||
==Application of Multi-label Learning== | |||
CSL also had better accuracy than traditional supervised learning methods, Pseudo-Label(Lee, 2013), and SMiLE(Tan et al., 2017) when presented with multi-labelled data <math>(x_i,y_i)</math>, where <math>y_i</math> is a <math>n</math>-long vector containing the correct output for each task. |
Latest revision as of 18:25, 15 November 2020
Task Understanding from Confusing Multi-task Data
Presented By aslkdfj;awekrf
1. Introduction
hialll
Hello
[math]\displaystyle{ \begin{align*} e & = \pi = \sqrt{g} \end{align*} }[/math]
2. Related Work
How does formatting of paragraphs work? hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi hi
[math]\displaystyle{ \begin{align*} e & = \text{Hellow}\\ & = \dfrac{123}{4}\\ \end{align*} }[/math]
[math]\displaystyle{
\begin{align*}
h+1 & = \dfrac{abc}{\text{def}}\\
& = \dfrac{123}{4}\\
\end{align*}
}[/math]
Experiment
Setup
3 data sets are used to compare CSL to existing methods, 1 function regression task and 2 image classification tasks.
Function Regression: The function regression data comes in the form of [math]\displaystyle{ (x_i,y_i),i=1,...,m }[/math] pairs. However, unlike typical regression problems, there are multiple [math]\displaystyle{ f_j(x),j=1,...,n }[/math] mapping functions, so the goal is to recover both the mapping functions [math]\displaystyle{ f_j }[/math] as well as determine which mapping function corresponds to each of the [math]\displaystyle{ m }[/math] observations. 3 scalar-valued, scalar-input functions that intersect at several points with each other have been chosen as the different tasks.
Colorful-MNIST: The first image classification data set consists of the MNIST digit data that has been colored. Each observation in this modified set consists of a colored image ([math]\displaystyle{ x_i }[/math]) and either the color, or the digit it represents ([math]\displaystyle{ y_i }[/math]). The goal is to recover the classification task ("color" or "digit") for each observation and construct the 2 classifiers for both tasks.
Kaggle Fashion Product: This data set has more observations than the "colored-MNIST" data and consists of pictures labelled with either the “Gender”, “Category”, and “Color” of the clothing item.
Use of Pre-Trained CNN Feature Layers
In the Kaggle Fashion Product experiment, each of the 3 classification algorithms [math]\displaystyle{ f_j }[/math] consist of fully-connected layers that have been attached to feature-identifying layers from pre-trained Convolutional Neural Networks.
Metrics of Confusing Supervised Learning
There are two measures of accuracy used to evaluate and compare CSL to other methods, corresponding respectively to the accuracy of the task labelling and the accuracy of the learned mapping function.
Label Assignment Accuracy: [math]\displaystyle{ \alpha_T(j) }[/math] is the average number of times the learned deconfusing function [math]\displaystyle{ h }[/math] agrees with the task-assignment ability of humans [math]\displaystyle{ \tilde h }[/math] on whether each observation in the data "is" or "is not" in task [math]\displaystyle{ j }[/math].
$$ \alpha_T(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m I[h(x_i,y_i;f_k),\tilde h(x_i,y_i;f_j)]$$
The max over [math]\displaystyle{ k }[/math] is taken because we need to determine which learned task corresponds to which ground-truth task.
Mapping Function Accuracy: [math]\displaystyle{ \alpha_T(j) }[/math] again chooses [math]\displaystyle{ f_k }[/math], the learned mapping function that is closest to the ground-truth of task [math]\displaystyle{ j }[/math], and measures its average absolute accuracy compared to the ground-truth of task [math]\displaystyle{ j }[/math], [math]\displaystyle{ f_j }[/math], across all [math]\displaystyle{ m }[/math] observations.
$$ \alpha_L(j) = \operatorname{max}_k\frac{1}{m}\sum_{i=1}^m 1-\dfrac{|g_k(x_i)-f_j(x_i)|}{|f_j(x_i)|}$$
Results
Given confusing data, the CSL performs better than traditional supervised learning methods, Pseudo-Label(Lee, 2013), and SMiLE(Tan et al., 2017). This is demonstrated by CSL's [math]\displaystyle{ \alpha_L }[/math] scores of around 95%, compared to [math]\displaystyle{ \alpha_L }[/math] scores of under 50% for the other methods. This supports the assertion that traditional methods only learn the means of all the ground-truth mapping functions when presented with confusing data.
Function Regression: In order to "correctly" partition the observations into the correct tasks, a 5-shot warm-up was used.
Image Classification: Visualizations created through Spectral embedding confirm the task labelling proficiency of the deconfusing neural network [math]\displaystyle{ h }[/math].
The classification and function prediction accuracy of CSL are comparable to supervised learning programs that have been given access to the ground-truth labels.
Application of Multi-label Learning
CSL also had better accuracy than traditional supervised learning methods, Pseudo-Label(Lee, 2013), and SMiLE(Tan et al., 2017) when presented with multi-labelled data [math]\displaystyle{ (x_i,y_i) }[/math], where [math]\displaystyle{ y_i }[/math] is a [math]\displaystyle{ n }[/math]-long vector containing the correct output for each task.