proposal for STAT946 projects: Difference between revisions

From statwiki
Jump to navigation Jump to search
No edit summary
Line 31: Line 31:
#Zhenyue Zhang, Jing Wang, MLLE: Modified Locally Linear Embedding Using Multiple Weights, 2006 Neural Information Processing Systems conference  
#Zhenyue Zhang, Jing Wang, MLLE: Modified Locally Linear Embedding Using Multiple Weights, 2006 Neural Information Processing Systems conference  
#Yaozhang Pan,Shuzhi Sam Ge, Abdullah Al Mamun, Weighted locally linear embedding for dimension reduction, Pattern Recognition 42, pp. 798 – 811, 2009
#Yaozhang Pan,Shuzhi Sam Ge, Abdullah Al Mamun, Weighted locally linear embedding for dimension reduction, Pattern Recognition 42, pp. 798 – 811, 2009
==Project 4: Reducing the dimension of financial time series ==
</noinclude>
===By: Yousef A. Sohrabi, Amir Memartoluie and Shu-tong Tse===

Revision as of 22:18, 29 June 2009

Use the following format for your proposal (maximum one page)

Project 1 : Graph Realization and its Applications

By: Ali Ghodsi and Soroush Ghodsi

Write your proposal here

Project 2 : Comformal Map in Classification of 3D Objects

By: Jiheng Wang, Zhiyue Huang and Saad Zaman

coming soon...

Project 3: LLE and noisy data

By: Ruchi Jiwrajka and Dennis Zhuang

Nonlinear dimensionality reduction (NLDR) aims to find low dimension representation of data lying on nonlinear manifolds in high dimensional space. It has many applications in classification, clustering, data visualization etc. In the recent years, ISOMAP and LLE have emerged as two advancing alternatives for the problem of nonlinear dimensionality reduction. ISOMAP attempts to preserve the global geometric property of the manifold while LLE tries to approximate the local geometric property [1].

Here, we only focus on LLE. Although LLE is quite successful in many applications due to its capability of dealing with large amount of data and its global and non-iterative way to do optimization involving only one parameter (the number of neighbors) to adjust; it comes with a assumption that data is well sampled from a single smooth nonlinear manifold. It could have bad performance under certain circumstances where this assumption is not well observed [1] [2] [3] [4]: (a) data lies on several disjoint manifolds; (b) data is sparsely and unevenly sampled; (c) data is contaminated by noise (outlier).

Motivated by the weakness of LLE, many variants of it were developed to improve its performance and make it robust to outliers. These can roughly be divided into three categories: (a) appropriately selecting the neighbors [2] [5] [6] [7] [10]; (b) identifying the outliers [4] [8]; (c) dealing with the instability of the Gram matrix [3] [9].

In the project, we will do a brief survey for methods endeavouring to improve LLE over not well sampled and noisy contaminated data. As, the second step of LLE , which is a least square optimization to find the weights that reconstruct a point from its neighbours could be ill-defined, it might result in inverting a singular or near singular Gram Matrix (G). Therefore, as suggested by Professor Ali Ghodsi we would like to see if the eigenvectors of the Gram Matrix could also be the solution to the optimization problem. As the eigenvector would satisfy the constraint [math]\displaystyle{ \mathbf {w^Tw}=1 }[/math], but the optimization problem requires that components of the weight vector sum to one, that is [math]\displaystyle{ \mathbf {w^Te}=1 }[/math], there is a need to normalize the eigenvectors of G and see if the normalized vector is still a good solution to the problem. The changing of the constraint results in the optimal weights that best linearly reconstruct a point from its neighbours to be the eigenvector of the Gram matrix corresponding to the smallest eigenvalue. Furthermore, we can change the optimal weight vector to be eigenvector associated with k-th smallest eigenvalue or some linearly combination of them. We would like to investigate whether using the eigenvector of the Gram matrix would help LLE to deal with the noisy contaminated data.

References

  1. Abdenour Hadid and Matti Pietikäinen, Efficient Locally Linear Embeddings of Imperfect Manifolds, MLDM 2003, LNAI 2734, pp. 188–201, 2003.
  2. Yulin Zhang, Jian Zhuang, Sun’an Wang, Xiaohu Li, Local Linear Embedding in Dimensionality Reduction Based on Small World Principle, 2008 International Conference on Computer Science and Software Engineering
  3. ChenpingHou, JingWang, YiWua, DongyunYi, Local linear transformation embedding, Neurocomputing 72, pp. 2368–2378, 2009
  4. Hong Chang, Dit-YanYeung, Robust locally linear embedding, Pattern Recognition 39, pp. 1053 – 1065, 2006
  5. Guihua Wen and Lijun Jiang, Clustering-based Locally Linear Embedding, 2006 IEEE International Conference on Systems, Man, and Cybernetics
  6. Kanghua Hui, Chunheng Wang, 2008 International Conference on Pattern Recognition
  7. Jian Xiao, Zongtan Zhou, Dewen Hu, Junsong Yin, and Shuang Chen, Self-organized Locally Linear Embedding for Nonlinear Dimensionality Reduction, ICNC 2005, LNCS 3610, pp. 101–109, 2005
  8. Xianhua Zeng, Siwei Luo, Generalized Locally Linear Embedding Based on Local Reconstruction Similarity, 2008 International Conference on Fuzzy Systems and Knowledge Discovery
  9. Zhenyue Zhang, Jing Wang, MLLE: Modified Locally Linear Embedding Using Multiple Weights, 2006 Neural Information Processing Systems conference
  10. Yaozhang Pan,Shuzhi Sam Ge, Abdullah Al Mamun, Weighted locally linear embedding for dimension reduction, Pattern Recognition 42, pp. 798 – 811, 2009

Project 4: Reducing the dimension of financial time series

By: Yousef A. Sohrabi, Amir Memartoluie and Shu-tong Tse