proposal Fall 2010

From statwiki
Revision as of 00:18, 29 October 2010 by Pfewzee (talk | contribs)
Jump to navigation Jump to search

Project 1 : Classifying New Data Points Using An Outlier Approach

By: Yongpeng Sun


Intuition:

In LDA, we assign a new data point to the class having the least distance to the center. At the same time however, it is desirable to assign a new data point to a class so that it is less of an outlier in that class as compared to every other class. To this end, compared to every other class, a new data point should be closer to the center of its assigned class and at the same time also, after suitable weighting has been done, be closer to the directions of variation of its assigned class.


Suppose there are two classes 0 and 1 both having [math]\displaystyle{ \,d }[/math] dimensions, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps:

Step 1: For each class, find its center and its [math]\displaystyle{ \,d }[/math] directions of variation.


Step 2: For the new data point, with regard to each of the two classes, sum up the point's distance to the center and the point's distance to each of the [math]\displaystyle{ \,d }[/math] directions of variation weighted (multiplied) by the ratio of the amount of variation in that direction to the total variation in that class.


Step 3: Assign the new point to the class having the smaller of these two sums.


These 3 steps can be easily generalized to the case where the number of classes is more than 2 because, to assign a new data point to a class, we only need to know, with regard to each class, the sum as described above.


I would like to evaluate the effectiveness of my idea / algorithm as compared to LDA and QDA and other classifiers using data sets in the UCI database ( http://archive.ics.uci.edu/ml/ ).

Project 2: Apply Hadoop Map-Reduce to a Classification Method

By: Maia Hariri, Trevor Sabourin, and Johann Setiawan

Develop map-reduce processes that can properly classify large distributed data sets.

Potential projects:

1. Use Hadoop Map-Reduce to implement the Support Vector Machine (Kernel) classification algorithm.
2. Use Hadoop Map-Reduce to implement the LDA classification algorithm on a novel problem (e.g. forensic identification of handwriting.)


Project 3 : Hierarchical Locally Linear Classification

By: Pouria Fewzee

Extension of an intrinsic two-class classifier to a multi-class may be challenging, as the common approaches either remain some vague areas in the feature space, or are computationally inefficient. One may found linear classifier and support vector machines two well-known instances of intrinsic two-class classifiers, and the k-1 and k(k-1)/2-hyperplanes as two most common approaches for extension of their capabilities to multi-class tasks. The k-1 bothers from leaving vague areas in the feature space and even the k(k-1)/2 does not have this problem, it is not computationally efficient. Hierarchical classification is proposed as a solution. This not only improves the efficiency of the classifier, but also the suggested tree could provide the specialists with new outlooks in the field.

To build a general purpose classifier which adapts to different patterns, as much as demanded, is another purpose of this project. To realize this goal, locally linear classification is proposed. Performing the locality in classifier design is accomplished by means of utilizing a combination of fuzzy computation tools along with binary decision trees. Being local in sense of feature space, makes these kinds of classifiers supple in fitting to many diverse configurations.