# Difference between revisions of "stat841f10"

(→Linear and Quadratic Discriminant Analysis cont'd - 2010.09.23) |
(→Linear and Quadratic Discriminant Analysis cont'd - 2010.09.23) |
||

Line 10: | Line 10: | ||

Some MATLAB samples are used to demonstrated LDA and QDA | Some MATLAB samples are used to demonstrated LDA and QDA | ||

+ | |||

+ | |||

+ | ===LDA x QDA=== | ||

+ | |||

+ | Linear discriminant analysis[http://en.wikipedia.org/wiki/Linear_discriminant_analysis] is a statistical method used to find the ''linear combination'' of features which best separate two or more classes of objects or events. It is widely applied in classifying diseases, positioning, product management, and marketing research. | ||

+ | |||

+ | Quadratic Discriminant Analysis[http://en.wikipedia.org/wiki/Quadratic_classifier], on the other had, aims to find the ''quadratic combination'' of features. It is more general than Linear discriminant analysis. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical. | ||

+ | |||

+ | ===Summarizing LDA and QDA=== | ||

+ | |||

+ | We can summarize what we have learned so far into the following theorem. | ||

+ | |||

+ | '''Theorem''': | ||

+ | |||

+ | Suppose that <math>\,Y \in \{1,\dots,k\}</math>, if <math>\,f_k(x) = Pr(X=x|Y=k)</math> is Gaussian, the Bayes Classifier rule is | ||

+ | :<math>\,h(X) = \arg\max_{k} \delta_k(x)</math> | ||

+ | where | ||

+ | :::<math> \,\delta_k = - \frac{1}{2}log(|\Sigma_k|) - \frac{1}{2}(x-\mu_k)^\top\Sigma_k^{-1}(x-\mu_k) + log (\pi_k) </math> (quadratic) | ||

+ | |||

+ | *'''Note''' The decision boundary between classes <math>k</math> and <math>l</math> is quadratic in <math>x</math>. | ||

+ | |||

+ | If the covariance of the Gaussians are the same, this becomes | ||

+ | |||

+ | :::<math> \,\delta_k = x^\top\Sigma^{-1}\mu_k - \frac{1}{2}\mu_k^\top\Sigma^{-1}\mu_k + log (\pi_k) </math> (linear) | ||

+ | |||

+ | *'''Note''' <math>\,\arg\max_{k} \delta_k(x)</math>returns the set of k for which <math>\,\delta_k(x)</math> attains its largest value. |

## Revision as of 15:40, 24 September 2010

## Contents

## Editor sign up

** Classfication-2010.09.21**

### Error rate

### Bayes Classifier

### Bayesian vs. Frequentist

**Linear and Quadratic Discriminant Analysis**

**Linear and Quadratic Discriminant Analysis cont'd - 2010.09.23**

In the second lecture, Professor Ali Ghodsi recapitulates that by calculating the class posteriors [math]\Pr(Y=k|X=x)[/math] we have optimal classification. He also shows that by assuming that the classes have common covariance matrix [math]\Sigma_{k}=\Sigma \forall k [/math] the decision boundary between classes [math]k[/math] and [math]l[/math] is linear (LDA). However, if we do not assume same covariance between the two classes the decision boundary is quadratic function (QDA).

Some MATLAB samples are used to demonstrated LDA and QDA

### LDA x QDA

Linear discriminant analysis[1] is a statistical method used to find the *linear combination* of features which best separate two or more classes of objects or events. It is widely applied in classifying diseases, positioning, product management, and marketing research.

Quadratic Discriminant Analysis[2], on the other had, aims to find the *quadratic combination* of features. It is more general than Linear discriminant analysis. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical.

### Summarizing LDA and QDA

We can summarize what we have learned so far into the following theorem.

**Theorem**:

Suppose that [math]\,Y \in \{1,\dots,k\}[/math], if [math]\,f_k(x) = Pr(X=x|Y=k)[/math] is Gaussian, the Bayes Classifier rule is

- [math]\,h(X) = \arg\max_{k} \delta_k(x)[/math]

where

- [math] \,\delta_k = - \frac{1}{2}log(|\Sigma_k|) - \frac{1}{2}(x-\mu_k)^\top\Sigma_k^{-1}(x-\mu_k) + log (\pi_k) [/math] (quadratic)

**Note**The decision boundary between classes [math]k[/math] and [math]l[/math] is quadratic in [math]x[/math].

If the covariance of the Gaussians are the same, this becomes

- [math] \,\delta_k = x^\top\Sigma^{-1}\mu_k - \frac{1}{2}\mu_k^\top\Sigma^{-1}\mu_k + log (\pi_k) [/math] (linear)

**Note**[math]\,\arg\max_{k} \delta_k(x)[/math]returns the set of k for which [math]\,\delta_k(x)[/math] attains its largest value.