is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction

From statwiki
Revision as of 23:29, 14 November 2010 by Lishayu (talk | contribs) (Created page with "==Introduction== A now standard method for analyzing discrete data such as documents is [http://en.wikipedia.org/wiki/Cluster_analysis clustering] or [http://en.wikipedia.org/wi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

A now standard method for analyzing discrete data such as documents is clustering or unsupervised learning. A rich variety of methods exist borrowing theory and algorithm from a board spectrum of computer science:spectral method, kd-trees, data merging algorithm and so on. All these methods, however, have one significant drawback for typical application in areas such as document or image analysis: each item/document is to be classified exclusively to one class. In practice documents invariable mix a few topics, readily seen by inspection of the human-classified Reuters newswire, so the automated construction of topic hierarchies need to be reflect this. One alternative is to make clusters multifaceted whereby a document can be assigned using a convex combination to a number of clusters rather than uniquely to one cluster. This is an unsupervised version of the so-called multi-class classification task.