f10 Stat841 digest

From statwiki
Jump to navigation Jump to search

Classification - September 21, 2010

  • Classification is an area of supervised learning that systematically assigns unlabeled novel data to their label through the characteristics and attributes obtained from observation.
  • Classification is the prediction of a discrete random variable [math]\displaystyle{ \mathcal{Y} }[/math] from another random variable [math]\displaystyle{ \mathcal{X} }[/math], where [math]\displaystyle{ \mathcal{Y} }[/math] represents the label assigned to a new data input and [math]\displaystyle{ \mathcal{X} }[/math] represents the known feature values of the input. The classification rule used by a classifier has the form [math]\displaystyle{ \,h: \mathcal{X} \mapsto \mathcal{Y} }[/math].
  • True error rate is the probability that the classification rule [math]\displaystyle{ \,h }[/math] does not correctly classify any data input. Empirical error rate is the frequency where the classification rule [math]\displaystyle{ \,h }[/math] does not correctly classify any data input in the training set. In experimental tasks true error cannot be measured and as a result the empirical error rate is used as its estimate.
  • Bayes Classifier is a probabilistic classifier by applying Bayes Theorem with strong (naive) independence assumptions. It has the advantage of requiring small training data to estimate the parameters needed for classification. Under this classifier an input [math]\displaystyle{ \,x }[/math] is classified to class [math]\displaystyle{ \,y }[/math] where the posterior probability for [math]\displaystyle{ \,y }[/math] is the largest for input [math]\displaystyle{ \,x }[/math].
  • Bayes Classification Rule Optimality Theorem states that Bayes classifier is the optimal classifier, in other words the true error rate of the Bayes classification rule will always be smaller or equal to any other classification rule
  • Bayes Decision Boundary is the hyperplane boundary that separates the two classes [math]\displaystyle{ \,m, n }[/math] obtained by setting the posterior probability for the two classes equal, [math]\displaystyle{ \,D(h)=\{x: P(Y=m|X=x)=P(Y=n|X=x)\} }[/math].
  • Linear Discriminant Analysis (LDA) for the Bayes classifier decision boundary between two classes makes the assumption that both are generated from Gaussian distribution and have the same covariance matrix.
  • PCA is an appropriate method when you have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables. This is a powerful technique for dimensionally reduction. It has applications in data visualization, data mining, reducing the dimensionality of a data set and etc. It is mostly used for data analysis and for making predictive models.

Linear and Quadratic Discriminant Analysis cont'd - September 23, 2010

In the second lecture, Professor Ali Ghodsi recapitulates that by calculating the class posteriors [math]\displaystyle{ \Pr(Y=k|X=x) }[/math] we have optimal classification. He also shows that by assuming that the classes have common covariance matrix [math]\displaystyle{ \Sigma_{k}=\Sigma \forall k }[/math], the decision boundary between classes [math]\displaystyle{ k }[/math] and [math]\displaystyle{ l }[/math] is linear (LDA). However, if we do not assume same covariance between the two classes, the decision boundary is a quadratic function (QDA).

The following MATLAB examples can be used to demonstrated LDA and QDA.


Fisher's (Linear) Discriminant Analysis (FDA) - Two Class Problem - October 5, 2010

This lecture introduces Fisher's linear discrimination analysis (FDA), which is a supervised dimensionality reduction method. FDA does not assume any distribution of the data and it works by reducing the dimensionality of the data by projecting the data on a line. That is, given d-dimensional data FDA project it to one-dimensional representation by [math]\displaystyle{ z = \underline{w}^T \underline{x} }[/math] where [math]\displaystyle{ x \in \mathbb{R}^{d} }[/math] and [math]\displaystyle{ \underline{w} = \begin{bmatrix}w_1 \\ \vdots \\w_d \end{bmatrix} _{d \times 1} }[/math]
FDA derives a set of feature vectors by which high-dimensional data can be projected onto a low-dimensional feature space in the sense of maximizing class separability. Furthermore, the lecture clarifies a set of FDA basic concepts like Fisher’s ratio, ratio of between-class scatter matrix to within-class scatter matrix. It also discusses the goals specified by Fisher for his analysis then proceeding by mathematical formulation of these goals.