proposal Fall 2010: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 7: Line 7:
Intuition:  
Intuition:  


In LDA, we assign a new data point to the class having the least distance to the center. At the same time however, it is desirable to assign a new data point to a class so that it is less of an outlier in that class as compared to every other class. To this end, compared to every other class, a new data point should be closer to the center of its assigned class and at the same time also be closer to the lines on which the directions of variation of its assigned class lie.
In LDA, we assign a new data point to the class having the least distance to the center. At the same time however, it is desirable to assign a new data point to a class so that it is less of an outlier in that class as compared to every other class. To this end, compared to every other class, a new data point should be closer to the center of its assigned class and at the same time also, after suitable weighting has been done, be closer to the directions of variation of its assigned class.




Suppose there are two classes 0 and 1, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps:
Suppose there are two classes 0 and 1 both having <math>\,d</math> dimensions, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps:


Step 1:  Use PCA to project the training data onto the 2-dimensional space for better visualization.
:::Step 1:  For each class, find its center and its <math>\,d</math> directions of variation.
    
    
Step 2:  For each class, find its center and the 2 lines on which its 2 directions of variation lie.


Step 3:  For the new data point, with regard to each of the two classes, find the sum of:
:::Step 2:  For the new data point, with regard to each of the two classes, sum up its distance to the center and its distance to each :::direction of variation weighted (multiplied) by the ratio of the amount of variation in that direction to the total variation.
::: a:  its distance to the center of that class


::: b:  its distance to the line on which the direction of maximum variation of that class lies weighted
:::(multiplied) by the ratio of the amount of variation in the direction of the maximum variation of that class to the total variation
:::of that class


::: cits distance to the line on which the direction of the second largest variation of that class lies weighted 
:::Step 3Assign the new point to the class having the smaller of these two sums.
:::(multiplied) by the ratio of the amount of variation in the direction of the second largest variation of that class to the total variation of that class               
 
::Then, assign the point to the class having the smaller of these two sums, so that the point is less of an outlier in the class that it is assigned to as compared to the other class.





Revision as of 12:16, 24 October 2010

Project 1 : Classifying New Data Points Using An Outlier Approach

By: Yongpeng Sun



Intuition:

In LDA, we assign a new data point to the class having the least distance to the center. At the same time however, it is desirable to assign a new data point to a class so that it is less of an outlier in that class as compared to every other class. To this end, compared to every other class, a new data point should be closer to the center of its assigned class and at the same time also, after suitable weighting has been done, be closer to the directions of variation of its assigned class.


Suppose there are two classes 0 and 1 both having [math]\displaystyle{ \,d }[/math] dimensions, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps:

Step 1: For each class, find its center and its [math]\displaystyle{ \,d }[/math] directions of variation.


Step 2: For the new data point, with regard to each of the two classes, sum up its distance to the center and its distance to each :::direction of variation weighted (multiplied) by the ratio of the amount of variation in that direction to the total variation.


Step 3: Assign the new point to the class having the smaller of these two sums.


These 3 steps can be easily generalized to the case where the number of classes is more than 2 because, to assign a new data point to a class, we only need to know, with regard to each class, the sum as described above.


I would like to evaluate the effectiveness of my idea / algorithm as compared to LDA and QDA and other classifiers using data sets in the UCI database ( http://archive.ics.uci.edu/ml/ ).