stat841f10: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 3: Line 3:


=== Classification ===
=== Classification ===
'''Statistical classification''', usually simply known as classification, refers to the [http://en.wikipedia.org/wiki/Supervised_learning]


With the rise of fields such as data-mining, bioinformatics, and machine learning, classification has becomes a fast-developing topic. In the age of information, vast amounts of data are generated constantly, and the goal of classification is to learn from data. Potential application areas include handwritten post codes recognition, medical diagnosis, face recognition, human language processing and so on.
With the rise of fields such as data-mining, bioinformatics, and machine learning, classification has becomes a fast-developing topic. In the age of information, vast amounts of data are generated constantly, and the goal of classification is to learn from data. Potential application areas include handwritten post codes recognition, medical diagnosis, face recognition, human language processing and so on.

Revision as of 16:24, 24 September 2010

Editor sign up

Classfication-2010.09.21

Classification

Statistical classification, usually simply known as classification, refers to the [1]

With the rise of fields such as data-mining, bioinformatics, and machine learning, classification has becomes a fast-developing topic. In the age of information, vast amounts of data are generated constantly, and the goal of classification is to learn from data. Potential application areas include handwritten post codes recognition, medical diagnosis, face recognition, human language processing and so on.

In classical statistics, classification techniques were developed to learn useful information using small data sets where there is usually not enough of data. When machine learning was developed after the application of computers to statistics, classification techniques were developed to work with very large data sets where there is usually too many data.

Error rate

Bayes Classifier

Bayesian vs. Frequentist

Linear and Quadratic Discriminant Analysis

Linear and Quadratic Discriminant Analysis cont'd - 2010.09.23

In the second lecture, Professor Ali Ghodsi recapitulates that by calculating the class posteriors [math]\displaystyle{ \Pr(Y=k|X=x) }[/math] we have optimal classification. He also shows that by assuming that the classes have common covariance matrix [math]\displaystyle{ \Sigma_{k}=\Sigma \forall k }[/math] the decision boundary between classes [math]\displaystyle{ k }[/math] and [math]\displaystyle{ l }[/math] is linear (LDA). However, if we do not assume same covariance between the two classes the decision boundary is quadratic function (QDA).

Some MATLAB samples are used to demonstrated LDA and QDA


LDA x QDA

Linear discriminant analysis[2] is a statistical method used to find the linear combination of features which best separate two or more classes of objects or events. It is widely applied in classifying diseases, positioning, product management, and marketing research.

Quadratic Discriminant Analysis[3], on the other had, aims to find the quadratic combination of features. It is more general than Linear discriminant analysis. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical.

Summarizing LDA and QDA

We can summarize what we have learned so far into the following theorem.

Theorem:

Suppose that [math]\displaystyle{ \,Y \in \{1,\dots,k\} }[/math], if [math]\displaystyle{ \,f_k(x) = Pr(X=x|Y=k) }[/math] is Gaussian, the Bayes Classifier rule is

[math]\displaystyle{ \,h(X) = \arg\max_{k} \delta_k(x) }[/math]

where

[math]\displaystyle{ \,\delta_k = - \frac{1}{2}log(|\Sigma_k|) - \frac{1}{2}(x-\mu_k)^\top\Sigma_k^{-1}(x-\mu_k) + log (\pi_k) }[/math] (quadratic)
  • Note The decision boundary between classes [math]\displaystyle{ k }[/math] and [math]\displaystyle{ l }[/math] is quadratic in [math]\displaystyle{ x }[/math].

If the covariance of the Gaussians are the same, this becomes

[math]\displaystyle{ \,\delta_k = x^\top\Sigma^{-1}\mu_k - \frac{1}{2}\mu_k^\top\Sigma^{-1}\mu_k + log (\pi_k) }[/math] (linear)
  • Note [math]\displaystyle{ \,\arg\max_{k} \delta_k(x) }[/math]returns the set of k for which [math]\displaystyle{ \,\delta_k(x) }[/math] attains its largest value.