Evaluating Machine Accuracy on ImageNet: Difference between revisions
(Created page with "== Presented by == Siyuan Xia, Jiaxiang Liu, Jiabao Dong, Yipeng Du == Introduction == ImageNet is the most influential data set in machine learning with images and corresp...") |
No edit summary |
||
Line 10: | Line 10: | ||
Lastly, the setup of drawing training and test sets from the same distribution may favour models over human labelers. That is, the accuracy of multi-class prediction from models drops when the testing set is drawn from a different distribution than the training set, ImageNetV2. But this shift in distribution does not cause a problem for human labelers. | Lastly, the setup of drawing training and test sets from the same distribution may favour models over human labelers. That is, the accuracy of multi-class prediction from models drops when the testing set is drawn from a different distribution than the training set, ImageNetV2. But this shift in distribution does not cause a problem for human labelers. | ||
== Experiment Setup == | |||
== Multi-label annotations== | |||
== Human Accuracy Measurement Process == | |||
== Main Results == | |||
== Other Observations == | |||
== Related Work == | |||
== Conclusion and Future Work == | |||
== Critiques == |
Revision as of 00:26, 30 November 2020
Presented by
Siyuan Xia, Jiaxiang Liu, Jiabao Dong, Yipeng Du
Introduction
ImageNet is the most influential data set in machine learning with images and corresponding labels over 1000 classes. This paper intends to explore the causes for performance differences between human experts and machine learning models, more specifically, CNN, on ImageNet.
Firstly, some images may fall into multiple classes. As a result, it is possible to underestimate the performance if we map each image to strictly one label, which is what is being done in the top-1 metric. Therefore, we adopt both top-1 and top-5 metrics where the performances of models, unlike human labelers, are linearly correlated in both cases.
Secondly, in contrast to the uniform performance of models on classes, humans tend to achieve better performances on inanimate objects. Human labelers achieve similar overall accuracies as the models, which indicates spaces of improvements on specific classes for machines.
Lastly, the setup of drawing training and test sets from the same distribution may favour models over human labelers. That is, the accuracy of multi-class prediction from models drops when the testing set is drawn from a different distribution than the training set, ImageNetV2. But this shift in distribution does not cause a problem for human labelers.