f10 Stat841 digest

From statwiki
Jump to navigation Jump to search

Classification - September 21, 2010

  • Classification is an area of supervised learning that systematically assigns unlabeled novel data to their label through the characteristics and attributes obtained from observation.
  • Classification is the prediction of a discrete random variable [math]\displaystyle{ \mathcal{Y} }[/math] from another random variable [math]\displaystyle{ \mathcal{X} }[/math], where [math]\displaystyle{ \mathcal{Y} }[/math] represents the label assigned to a new data input and [math]\displaystyle{ \mathcal{X} }[/math] represents the known feature values of the input. The classification rule used by a classifier has the form [math]\displaystyle{ \,h: \mathcal{X} \mapsto \mathcal{Y} }[/math].
  • True error rate is the probability that the classification rule [math]\displaystyle{ \,h }[/math] does not correctly classify any data input. Empirical error rate is the frequency where the classification rule [math]\displaystyle{ \,h }[/math] does not correctly classify any data input in the training set. In experimental tasks true error cannot be measured and as a result the empirical error rate is used as its estimate.
  • Bayes Classifier is a probabilistic classifier by applying Bayes Theorem with strong (naive) independence assumptions. It has the advantage of requiring small training data to estimate the parameters needed for classification. Under this classifier an input [math]\displaystyle{ \,x }[/math] is classified to class [math]\displaystyle{ \,y }[/math] where the posterior probability for [math]\displaystyle{ \,y }[/math] is the largest for input [math]\displaystyle{ \,x }[/math].
  • Bayes Classification Rule Optimality Theorem states that Bayes classifier is the optimal classifier, in other words the true error rate of the Bayes classification rule will always be smaller or equal to any other classification rule
  • Bayes Decision Boundary is the hyperplane boundary that separates the two classes [math]\displaystyle{ \,m, n }[/math] obtained by setting the posterior probability for the two classes equal, [math]\displaystyle{ \,D(h)=\{x: P(Y=m|X=x)=P(Y=n|X=x)\} }[/math].
  • Linear Discriminant Analysis (LDA) for the Bayes classifier decision boundary between two classes makes the assumption that both are generated from Gaussian distribution and have the same covariance matrix.
  • PCA is an appropriate method when you have obtained measures on a number of observed variables and wish to develop a smaller number of artificial variables (called principal components) that will account for most of the variance in the observed variables. This is a powerful technique for dimensionally reduction. It has applications in data visualization, data mining, reducing the dimensionality of a data set and etc. It is mostly used for data analysis and for making predictive models.

Linear and Quadratic Discriminant Analysis cont'd - September 23, 2010

In the second lecture, Professor Ali Ghodsi recapitulates that by calculating the class posteriors [math]\displaystyle{ \Pr(Y=k|X=x) }[/math] we have optimal classification. He also shows that by assuming that the classes have common covariance matrix [math]\displaystyle{ \Sigma_{k}=\Sigma \forall k }[/math], the decision boundary between classes [math]\displaystyle{ k }[/math] and [math]\displaystyle{ l }[/math] is linear (LDA). However, if we do not assume same covariance between the two classes, the decision boundary is a quadratic function (QDA).

The following MATLAB examples can be used to demonstrated LDA and QDA.

Fisher's (Linear) Discriminant Analysis (FDA) - Two Class Problem - October 5, 2010

This lecture introduces Fisher's linear discrimination analysis (FDA), which is a supervised dimensionality reduction method. FDA does not assume any distribution of the data and it works by reducing the dimensionality of the data by projecting the data on a line. That is, given d-dimensional data FDA project it to one-dimensional representation by [math]\displaystyle{ z = \underline{w}^T \underline{x} }[/math] where [math]\displaystyle{ x \in \mathbb{R}^{d} }[/math] and [math]\displaystyle{ \underline{w} = \begin{bmatrix}w_1 \\ \vdots \\w_d \end{bmatrix} _{d \times 1} }[/math]
FDA derives a set of feature vectors by which high-dimensional data can be projected onto a low-dimensional feature space in the sense of maximizing class separability. Furthermore, the lecture clarifies a set of FDA basic concepts like Fisher’s ratio, ratio of between-class scatter matrix to within-class scatter matrix. It also discusses the goals specified by Fisher for his analysis then proceeding by mathematical formulation of these goals.

Fisher's (Linear) Discriminant Analysis (FDA) - Multi-Class Problem - October 7, 2010

This lecture describes a generalization of Fisher's discriminant analysis to more than 2 classes. For the multi-class, or [math]\displaystyle{ k }[/math]-class problem, we are trying to find a projection from a [math]\displaystyle{ d }[/math]-dimensional space to a [math]\displaystyle{ (k-1) }[/math]-dimensional space. Recall that for the [math]\displaystyle{ 2 }[/math]-class problem, the objective function was [math]\displaystyle{ \displaystyle \max \frac{(w^Ts_Bw)}{(w^Ts_ww)} }[/math] . In the [math]\displaystyle{ k }[/math]-class problem, [math]\displaystyle{ \mathbf{W} }[/math] is a [math]\displaystyle{ d\times (k-1) }[/math] transformation matrix, [math]\displaystyle{ \mathbf{W} =[\mathbf{w}_{1}, \mathbf{w}_{2},..., \mathbf{w}_{k-1}] }[/math], and the objective function becomes [math]\displaystyle{ \displaystyle \max \frac{Tr[\mathbf{W}^{T}\mathbf{S}_{B}\mathbf{W}]}{Tr[\mathbf{W}^{T}\mathbf{S}_{W}\mathbf{W}]} }[/math]

As in the [math]\displaystyle{ 2 }[/math]-class case, this is also a generalized eigenvalue problem, and the solution can be computed as the first [math]\displaystyle{ (k-1) }[/math] eigenvectors of [math]\displaystyle{ \mathbf{S}_{W}^{-1}\mathbf{S}_{B}, }[/math] i.e. [math]\displaystyle{ \mathbf{S}_{W}^{-1}\mathbf{S}_{B}\mathbf{w}_{i} =\lambda_{i}\mathbf{w}_{i} }[/math].


Linear and Logistic Regression - October 12, 2010

In this Lecture, Prof Ali Ghodsi reviews the LDA as a dimensionality reduction method and introduces 2 models for regression, linear and logistic regression.

Regression analysis is a general statistical technique for modelling and analyzing how a dependent variable changes according to changes in independent variables. In classification, we are interested in how a label, [math]\displaystyle{ \,y }[/math], changes according to changes in [math]\displaystyle{ \,X }[/math].

General information on linear regression can be found at the University of South Florida and this MIT lecture.