proposal for STAT946 projects Fall 2010: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
==Project 1 : Sampling Landmarks Using a Convergence Approach ==
==Project 1 : Sampling Landmarks For Landmark MDS Using A Distances-From-The-Center Approach ==
</noinclude>
</noinclude>
===By: Yongpeng Sun===
===By: Yongpeng Sun===


<br>
<br>
Intuition:  
Intuition:
When there is a very large number of data n, and a very small portion of them totalling k is to be sampled as landmarks for landmark MDS, there are two common approaches:
::1. use simple random sampling (SRS) to sample the k landmarks


When we have a large number of data points and we wish to sample a small portion of them as landmarks, using simple random sampling (SRS) does not guarantee good results. The reason is that, when the sample is very small, SRS sometimes results in a sample that is a small cluster within the data space that would not be representative of all of the data. We would like our landmarks to be as representative of the data set as possible. Thus, we can, starting from the center of the data space, alternatively obtain landmarks that are as near as possible to the center and as far away as possible from the center, effectively removing each newly-sampled landmark by shrinking the distances matrix. During these steps, the closest point from the center and the furthest point from the center from the remaining data space gradually converge towards each other. This is because, during the steps, the distance between the nearest point from the center and the center will gradually increase while at the same time the distance between the furthest point from the center and the center will gradually decrease as the nearest point from the center and the furthest point from the center are effectively removed from the data space.


Before starting the procedure, we require the distances matrix containing the pair-wise distances between the data points. This matrix is constructed only once, and it is updated two times during each step.
::2. use K-means to find k clusters in the data, and select the points closest to the centers of these clusters as the k landmarks




The procedure for obtaining the landmarks can be described by the following steps:  
Neither of these approaches however, can guarantee excellent results. In the case of using SRS, sometimes the sampled landmarks comprise a small cluster in the dataset that does not represent the entire dataset well. In the case of using K-means, the set of resulting landmarks is not stable between different runs in that the final k centers depends upon what the initial k centers are. Thus, another approach is needed to meet the following criteria:
::1. guarantee that the k selected landmarks would represent the dataset well


::Step 1:  Find the point closet to the center, and add this point to the to-do queue and the landmarks. Update the distances matrix by removing the row and the column regarding this point.
::Step 2:  For the front-most point in the to-do queue, find the closest point and the furthest point from it, and add these two points to the to-do queue and the landmarks. Update the distances matrix by removing the row and the column regarding each of these two points. Remove the front-most point from the to-do-queue.             
 
::Step 3:  If the number of landmarks suffices, then stop. Otherwise, go back to step 2.


::2. guarantee that the k selected landmarks would be stable between different runs




Using 27-dimensional phoneme data, I would like to compare the efficiency of my method of sampling landmarks against the most common approach of sampling landmarks, which is by using simple random sampling (SRS).  
To formulate this approach, I observed that if, in each step, I start from the point closest to the center of the dataset and take the closest and furthest points as landmarks and remove them from the dataset before moving on to the next step, the final set of obtained landmarks could meet the two above desired criteria.


<noinclude>


If there are clusters in the dataset, then this approach should also be able to sample more landmarks from clusters as compared to from sparse areas in the dataset, because of the following two points:
::1. the nearest point from the center in one step and the next-nearest point from the center in the next step are more likely to be from the same cluster than not


( NOTE: STILL BEING EDITED )


::2. the furthest point from the center in one step and the next-furthest point from the center in the next step are more likely to be from the same cluster than not.
Before beginning the procedure of this approach, we require an n by n distances matrix that contains the pair-wise distances between the n data points. This matrix is updated (shrunk) during the procedure.
The overall procedure can be described in the following steps:
::Step 1: Find the center of the dataset, and find the point closest to this center.
::Step 2: If less than k landmarks has been obtained, then find the nearest point from the center, take it as a landmark, and remove it from the dataset by removing the row and column associated with it from the distances matrix. Otherwise, end the procedure.
::Step 3: If less than k landmarks has been obtained, then find the furthest point from the center, take it as a landmark, and remove it from the dataset by removing the row and column associated with it from the distances matrix. Otherwise, end the procedure.
::Step 4: Go back to step 2.
I would like to compare the effective of my approach to those of using SRS and K-means for sampling landmarks by applying all three approaches to 27-dimensional labelled phoneme data for sampling landmarks before using landmark MDS to reduce the dimensionality of the phoneme data before classifying the phonemes.


==Project 2 : Sparse LLE ==
==Project 2 : Sparse LLE ==

Revision as of 20:40, 30 October 2010

Project 1 : Sampling Landmarks For Landmark MDS Using A Distances-From-The-Center Approach

By: Yongpeng Sun


Intuition: When there is a very large number of data n, and a very small portion of them totalling k is to be sampled as landmarks for landmark MDS, there are two common approaches:

1. use simple random sampling (SRS) to sample the k landmarks


2. use K-means to find k clusters in the data, and select the points closest to the centers of these clusters as the k landmarks


Neither of these approaches however, can guarantee excellent results. In the case of using SRS, sometimes the sampled landmarks comprise a small cluster in the dataset that does not represent the entire dataset well. In the case of using K-means, the set of resulting landmarks is not stable between different runs in that the final k centers depends upon what the initial k centers are. Thus, another approach is needed to meet the following criteria:

1. guarantee that the k selected landmarks would represent the dataset well


2. guarantee that the k selected landmarks would be stable between different runs


To formulate this approach, I observed that if, in each step, I start from the point closest to the center of the dataset and take the closest and furthest points as landmarks and remove them from the dataset before moving on to the next step, the final set of obtained landmarks could meet the two above desired criteria.


If there are clusters in the dataset, then this approach should also be able to sample more landmarks from clusters as compared to from sparse areas in the dataset, because of the following two points:

1. the nearest point from the center in one step and the next-nearest point from the center in the next step are more likely to be from the same cluster than not


2. the furthest point from the center in one step and the next-furthest point from the center in the next step are more likely to be from the same cluster than not.


Before beginning the procedure of this approach, we require an n by n distances matrix that contains the pair-wise distances between the n data points. This matrix is updated (shrunk) during the procedure. The overall procedure can be described in the following steps:

Step 1: Find the center of the dataset, and find the point closest to this center.


Step 2: If less than k landmarks has been obtained, then find the nearest point from the center, take it as a landmark, and remove it from the dataset by removing the row and column associated with it from the distances matrix. Otherwise, end the procedure.


Step 3: If less than k landmarks has been obtained, then find the furthest point from the center, take it as a landmark, and remove it from the dataset by removing the row and column associated with it from the distances matrix. Otherwise, end the procedure.


Step 4: Go back to step 2.


I would like to compare the effective of my approach to those of using SRS and K-means for sampling landmarks by applying all three approaches to 27-dimensional labelled phoneme data for sampling landmarks before using landmark MDS to reduce the dimensionality of the phoneme data before classifying the phonemes.

Project 2 : Sparse LLE

By: Manda Winlaw

One drawback of using Local Linear Embedding (LLE) for dimensionality reduction is that it does not perform well if there are outliers or noise in the data. To overcome this limitation one possible approach is to require sparsity in the vector of reconstruction weights. The reconstruction weights for each data point [math]\displaystyle{ \displaystyle\overline{X}_i }[/math] are chosen to minimize the reconstruction errors measured by the following cost function,

[math]\displaystyle{ \displaystyle{E(W) = \sum_i |\overline{X}_i - \sum_j^{K} W_{ij}\overline{X}_j|^2} }[/math]

where the [math]\displaystyle{ \displaystyle\overline{X}_j, j = 1, \ldots, K }[/math] are the [math]\displaystyle{ \displaystyle{K} }[/math] nearest neighbors of [math]\displaystyle{ \displaystyle\overline{X}_i }[/math] and the weights, [math]\displaystyle{ \displaystyle{W_{ij}} }[/math], must satisfy [math]\displaystyle{ \displaystyle\sum_{j} W_{ij} = 1 }[/math].

From this cost function we can see that if some of the weights are zero due to a sparsity constraint then outliers may be eliminated from the reconstruction.

To impose the sparsity constraint we introduce an additional constraint into the optimization problem for each data point to get,

[math]\displaystyle{ \displaystyle{\min_{\overline{W}_i} |X_i - \sum_j W_{ij}X_j|^2} }[/math] such that [math]\displaystyle{ \; \sum_j W_{ij} = 1, \; P(\overline{W}_i) \leq c_1, }[/math]

where [math]\displaystyle{ \displaystyle{c_1} }[/math] is a constant and [math]\displaystyle{ P(\overline{W}_i) }[/math] is a convex penalty function of [math]\displaystyle{ \overline{W}_i }[/math] which can take different forms.

Of particular interest, will be [math]\displaystyle{ P(\overline{W}_i) = \left\| \overline{W}_i \right\|^2_2 }[/math] in which case the optimization problem is equivalent to ridge regression and the lasso penalty function, [math]\displaystyle{ P(\overline{W}_i) = \left\| \overline{W}_i \right\|^2_1 }[/math]. As noted in <ref>R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58:267–288, 1996.</ref>, the lasso penalty function is a more appropriate penalty function for controlling sparsity than the [math]\displaystyle{ L_2 }[/math]-norm penalty function since the [math]\displaystyle{ L_2 }[/math]-norm penalty function does not actually give sparse solutions rather the solutions are scaled to give non-sparse low values. However, using the [math]\displaystyle{ L_2 }[/math]-norm penalty function results in a closed form solution for the weights. There is no closed form solution for the lasso penalty function which implies we must use a numerical optimization algorithm to solve this problem increasing the computational complexity of the LLE algorithm. My goal is examine both of these penalty functions as a means of improving the performance of the LLE algorithm for noisy data and to compare their performance in terms of accuracy and computational complexity.

References

<references />


Project 3 : Face Recognition Using LDA-Based Algorithms

By: LishaYu

The most successful solution to the problem of feature selection for face representation seems to be those appearance-based approaches, which generally operate directly on images or appearances of face object and process the images as two-dimensional holistic patterns.

However, most of traditional linear discriminant analysis (LDA)-based methods suffer from the disadvantage that their optimality criteria are not directly related to the classification ability of the obtained feature representation. Moreover, their classification accuracy is affected by the “small sample size” (SSS) problem that is often encountered in FR tasks.

This paper introduces a new algorithm that deals with both of the shortcomings in an efficient and cost manner – Direct Fractional - Step LDA (DF-LDA). The following subtopics are included:

1. Introduction of Direct Fractional - Step LDA (DF-LDA)
- Where are the Optimal Discriminant Features?
- Variation f D-LDA
- Rotation and Reorientation of D-LDA Subspace
2. Experimental Result

The new algorithm is compared, in terms of classification accuracy, to other commonly used FR methods on two face databases. Results indicate that the performance of the proposed method is overall superior to those of traditional FR approaches, such as the Eigenfaces, Fisherfaces, and D-LDA methods.

If time is allowed, LDA in small sample size (SSS) scenarios with application to face recognition will be included.