Evaluating Machine Accuracy on ImageNet

From statwiki
Revision as of 00:26, 30 November 2020 by J52dong (talk | contribs)
Jump to: navigation, search

Presented by

Siyuan Xia, Jiaxiang Liu, Jiabao Dong, Yipeng Du


ImageNet is the most influential data set in machine learning with images and corresponding labels over 1000 classes. This paper intends to explore the causes for performance differences between human experts and machine learning models, more specifically, CNN, on ImageNet.

Firstly, some images may fall into multiple classes. As a result, it is possible to underestimate the performance if we map each image to strictly one label, which is what is being done in the top-1 metric. Therefore, we adopt both top-1 and top-5 metrics where the performances of models, unlike human labelers, are linearly correlated in both cases.

Secondly, in contrast to the uniform performance of models on classes, humans tend to achieve better performances on inanimate objects. Human labelers achieve similar overall accuracies as the models, which indicates spaces of improvements on specific classes for machines.

Lastly, the setup of drawing training and test sets from the same distribution may favour models over human labelers. That is, the accuracy of multi-class prediction from models drops when the testing set is drawn from a different distribution than the training set, ImageNetV2. But this shift in distribution does not cause a problem for human labelers.

Experiment Setup

Multi-label annotations

Human Accuracy Measurement Process

Main Results

Other Observations

Related Work

Conclusion and Future Work