stat841: Difference between revisions
Line 8: | Line 8: | ||
'''2. Classification'''<br /> | '''2. Classification'''<br /> | ||
A ''''classification rule'''' <math>\,h</math> is a function between two discrete random varialbe <math>\,X</math> and <math>\,Y</math>. Given n pairs of data <math>\,(X_{1},Y_{1}), (X_{2},Y_{2}, \dots , (X_{n},Y_{n}))</math>, where <math>\,X_{i}= \{ X_{i1}, X_{i2}, \dots , | A ''''classification rule'''' <math>\,h</math> is a function between two discrete random varialbe <math>\,X</math> and <math>\,Y</math>. Given n pairs of data <math>\,(X_{1},Y_{1}), (X_{2},Y_{2}, \dots , (X_{n},Y_{n}))</math>, where <math>\,X_{i}= \{ X_{i1}, X_{i2}, \dots , X_{id} \} \in \mathcal{X} \subset \Re^{d}</math><br /> | ||
is a ''d''-dimensional vector and <math>\,Y_{i}</math> takes values in a finite set <math>\, \mathcal{Y} </math>. Set up a function <math>\,h</math> that <math>\,h: \mathcal{X} \mapsto \mathcal{Y} </math>. Thus, given a new vector <math>\,X</math>, we can give a prediction of corresponding <math>\,Y</math> by the classification rule <math>\,h</math> that <math>\,\overline{Y}=h(X)</math> | is a ''d''-dimensional vector and <math>\,Y_{i}</math> takes values in a finite set <math>\, \mathcal{Y} </math>. Set up a function <math>\,h</math> that <math>\,h: \mathcal{X} \mapsto \mathcal{Y} </math>. Thus, given a new vector <math>\,X</math>, we can give a prediction of corresponding <math>\,Y</math> by the classification rule <math>\,h</math> that <math>\,\overline{Y}=h(X)</math> | ||
Line 20: | Line 20: | ||
'''4. Bayes Classifier'''<br /> | '''4. Bayes Classifier'''<br /> | ||
Specially, when the value range of <math>\,Y</math> is an index set(or label set) that <math>\, \mathcal{Y}=\{ 0, 1\}</math> | Specially, when the value range of <math>\,Y</math> is an index set(or label set) that <math>\, \mathcal{Y}=\{0, 1\}</math>. Consider the prabobility that <math>\,r(X)=P\{\}</math> |
Revision as of 16:27, 30 September 2009
Scribe sign up
Course Note for Sept.30th (Classfication_by Liang Jiaxi)
1.
2. Classification
A 'classification rule' [math]\displaystyle{ \,h }[/math] is a function between two discrete random varialbe [math]\displaystyle{ \,X }[/math] and [math]\displaystyle{ \,Y }[/math]. Given n pairs of data [math]\displaystyle{ \,(X_{1},Y_{1}), (X_{2},Y_{2}, \dots , (X_{n},Y_{n})) }[/math], where [math]\displaystyle{ \,X_{i}= \{ X_{i1}, X_{i2}, \dots , X_{id} \} \in \mathcal{X} \subset \Re^{d} }[/math]
is a d-dimensional vector and [math]\displaystyle{ \,Y_{i} }[/math] takes values in a finite set [math]\displaystyle{ \, \mathcal{Y} }[/math]. Set up a function [math]\displaystyle{ \,h }[/math] that [math]\displaystyle{ \,h: \mathcal{X} \mapsto \mathcal{Y} }[/math]. Thus, given a new vector [math]\displaystyle{ \,X }[/math], we can give a prediction of corresponding [math]\displaystyle{ \,Y }[/math] by the classification rule [math]\displaystyle{ \,h }[/math] that [math]\displaystyle{ \,\overline{Y}=h(X) }[/math]
3. Error data
Definition:
'True error rate' of a classifier(h) is defined as the probability that [math]\displaystyle{ \overline{Y} }[/math] predicted from [math]\displaystyle{ \,X }[/math] by classifier [math]\displaystyle{ \,h }[/math] does not actually equal to [math]\displaystyle{ \,Y }[/math], namely, [math]\displaystyle{ \, L(h)=P(h(X) \neq Y) }[/math].
'Empirical error rate(training error rate)' of a classifier(h) is defined as the frequency of event that [math]\displaystyle{ \overline{Y} }[/math] predicted from [math]\displaystyle{ \,X }[/math] by [math]\displaystyle{ \,h }[/math] does not equal to [math]\displaystyle{ \,Y }[/math] in total n prediction. The mathematical expression is as below:
[math]\displaystyle{ \, L_{h}= \frac{1}{n} \sum_{i=1}^{n} I(h(X_{i} \neq Y_{i})) }[/math], where [math]\displaystyle{ \,I }[/math] is an indicator that [math]\displaystyle{ \, I= }[/math].
4. Bayes Classifier
Specially, when the value range of [math]\displaystyle{ \,Y }[/math] is an index set(or label set) that [math]\displaystyle{ \, \mathcal{Y}=\{0, 1\} }[/math]. Consider the prabobility that [math]\displaystyle{ \,r(X)=P\{\} }[/math]