Evaluating Machine Accuracy on ImageNet

From statwiki
Revision as of 01:23, 30 November 2020 by J52dong (talk | contribs) (Created page with "== Presented by == Siyuan Xia, Jiaxiang Liu, Jiabao Dong, Yipeng Du == Introduction == ImageNet is the most influential data set in machine learning with images and corresp...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Presented by

Siyuan Xia, Jiaxiang Liu, Jiabao Dong, Yipeng Du

Introduction

ImageNet is the most influential data set in machine learning with images and corresponding labels over 1000 classes. This paper intends to explore the causes for performance differences between human experts and machine learning models, more specifically, CNN, on ImageNet.

Firstly, some images may fall into multiple classes. As a result, it is possible to underestimate the performance if we map each image to strictly one label, which is what is being done in the top-1 metric. Therefore, we adopt both top-1 and top-5 metrics where the performances of models, unlike human labelers, are linearly correlated in both cases.

Secondly, in contrast to the uniform performance of models on classes, humans tend to achieve better performances on inanimate objects. Human labelers achieve similar overall accuracies as the models, which indicates spaces of improvements on specific classes for machines.

Lastly, the setup of drawing training and test sets from the same distribution may favour models over human labelers. That is, the accuracy of multi-class prediction from models drops when the testing set is drawn from a different distribution than the training set, ImageNetV2. But this shift in distribution does not cause a problem for human labelers.