Extreme Multi-label Text Classification

From statwiki
Revision as of 09:13, 9 November 2020 by Mhwu (talk | contribs) (→‎Introduction)
Jump to navigation Jump to search

Presented By

Mohan Wu

Introduction

In this paper, the authors are interested a field of problems called extreme classification. These problems involve training a classifier to give the most relevant tags for any given text; the difficulties arises from the fact that the label set is so large that most models give poor results. The authors propose a new model called APLC-XLNet which fine tunes the generalized autoregressive pretrained model (XLNet) by using Adaptive Probabilistic Label Clusters (APLC) to calculate cross entropy loss. This method takes advantage of unbalanced label distributions by forming clusters to reduce training time. The authors experimented on five different datasets and achieved results far better than existing state-of-the-art models.

Motivation

Extreme multi-label text classification (XMTC) has applications in many recent problems such as estimating word representations of a large vocabulary [1], tagging Wikipedia with relevant labels [2] and product descriptions for search advertisements [3]. The authors are motivated by the shortcomings of traditional methods in the creation of XMTC. For example, one such method of classifying text is the bag-of-words (BOW) approach where a vector represents the frequency of a word in a corpus. However, BOW does not consider the location of the words and thus cannot capture context and semantics.

References

Mikolov, T., Kombrink, S., Burget, L., Cernock ˇ y, J., and ` Khudanpur, S. Extensions of recurrent neural network language model. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE, 2011. [1]

Dekel, O. and Shamir, O. Multiclass-multilabel classification with more classes than examples. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 137–144, 2010. [2]

Jain, H., Prabhu, Y., and Varma, M. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM, 2016. [3]