Difference between revisions of "proposal for STAT946 projects"

Use the following format for your proposal (maximum one page)

Project 2 : Comformal Map in Classification of 3D Objects

By: Jiheng Wang, Zhiyue Huang and Saad Zaman

(tentative)

The classification of three dimensional surfaces is a fundamental issue in artificial intelligence and computer vision with many applications. However, 3D surface matching with noise, occlusion and clutter is a challenging problem.

There exist a class of algorithms in the area of dimensionality reduction. However, many of them have a particular drawback: they do not produce conformal map.

A conformal map is a low-dimensional embedding where the angles formed by three neighboring points in the original high dimensional dataset are equal to the angles between those same three points in the embedding.

Basically, we will follow the idea in the paper "Conformal Mapping by Computationally Efficient Methods" and mainly discuss its application in Classification of 3D objects.

References

1.Xianfeng Gu, Shing-Tung Yau, "Surface Classification Using Conformal Structures," iccv, vol. 1, pp.701, Ninth IEEE International Conference on Computer Vision (ICCV'03) - Volume 1, 2003

2.Sen Wang, Yang Wang, Miao Jin, Gu, X.D., Samaras, D.,Conformal Geometry and Its Applications on 3D Shape Matching, Recognition, and Stitching, Pattern Analysis and Machine Intelligence, IEEE Transactions on, page(s): 1209-1220, Volume: 29, Issue: 7, July 2007

Project 3: LLE and noisy data

By: Ruchi Jiwrajka and Dennis Zhuang

Nonlinear dimensionality reduction (NLDR) aims to find low dimension representation of data lying on nonlinear manifolds in high dimensional space. It has many applications in classification, clustering, data visualization etc. In the recent years, ISOMAP and LLE have emerged as two advancing alternatives for the problem of nonlinear dimensionality reduction. ISOMAP attempts to preserve the global geometric property of the manifold while LLE tries to approximate the local geometric property [1].

Here, we only focus on LLE. Although LLE is quite successful in many applications due to its capability of dealing with large amount of data and its global and non-iterative way to do optimization involving only one parameter (the number of neighbors) to adjust; it comes with a assumption that data is well sampled from a single smooth nonlinear manifold. It could have bad performance under certain circumstances where this assumption is not well observed [1] [2] [3] [4]: (a) data lies on several disjoint manifolds; (b) data is sparsely and unevenly sampled; (c) data is contaminated by noise (outlier).

Motivated by the weakness of LLE, many variants of it were developed to improve its performance and make it robust to outliers. These can roughly be divided into three categories: (a) appropriately selecting the neighbors [2] [5] [6] [7] [10]; (b) identifying the outliers [4] [8]; (c) dealing with the instability of the Gram matrix [3] [9].

In the project, we will do a brief survey for methods endeavouring to improve LLE over not well sampled and noisy contaminated data. As, the second step of LLE , which is a least square optimization to find the weights that reconstruct a point from its neighbours could be ill-defined, it might result in inverting a singular or near singular Gram Matrix (G). Therefore, as suggested by Professor Ali Ghodsi we would like to see if the eigenvectors of the Gram Matrix could also be the solution to the optimization problem. As the eigenvector would satisfy the constraint $\mathbf {w^Tw}=1$, but the optimization problem requires that components of the weight vector sum to one, that is $\mathbf {w^Te}=1$, there is a need to normalize the eigenvectors of G and see if the normalized vector is still a good solution to the problem. The changing of the constraint results in the optimal weights that best linearly reconstruct a point from its neighbours to be the eigenvector of the Gram matrix corresponding to the smallest eigenvalue. Furthermore, we can change the optimal weight vector to be eigenvector associated with k-th smallest eigenvalue or some linearly combination of them. We would like to investigate whether using the eigenvector of the Gram matrix would help LLE to deal with the noisy contaminated data.

References

1. Abdenour Hadid and Matti Pietikäinen, Efficient Locally Linear Embeddings of Imperfect Manifolds, MLDM 2003, LNAI 2734, pp. 188–201, 2003.
2. Yulin Zhang, Jian Zhuang, Sun’an Wang, Xiaohu Li, Local Linear Embedding in Dimensionality Reduction Based on Small World Principle, 2008 International Conference on Computer Science and Software Engineering
3. ChenpingHou, JingWang, YiWua, DongyunYi, Local linear transformation embedding, Neurocomputing 72, pp. 2368–2378, 2009
4. Hong Chang, Dit-YanYeung, Robust locally linear embedding, Pattern Recognition 39, pp. 1053 – 1065, 2006
5. Guihua Wen and Lijun Jiang, Clustering-based Locally Linear Embedding, 2006 IEEE International Conference on Systems, Man, and Cybernetics
6. Kanghua Hui, Chunheng Wang, 2008 International Conference on Pattern Recognition
7. Jian Xiao, Zongtan Zhou, Dewen Hu, Junsong Yin, and Shuang Chen, Self-organized Locally Linear Embedding for Nonlinear Dimensionality Reduction, ICNC 2005, LNCS 3610, pp. 101–109, 2005
8. Xianhua Zeng, Siwei Luo, Generalized Locally Linear Embedding Based on Local Reconstruction Similarity, 2008 International Conference on Fuzzy Systems and Knowledge Discovery
9. Zhenyue Zhang, Jing Wang, MLLE: Modified Locally Linear Embedding Using Multiple Weights, 2006 Neural Information Processing Systems conference
10. Yaozhang Pan,Shuzhi Sam Ge, Abdullah Al Mamun, Weighted locally linear embedding for dimension reduction, Pattern Recognition 42, pp. 798 – 811, 2009

Project 4: Reducing the dimension of financial time series

By: Yousef A. Sohrabi, Amir Memartoluie and Shu-tong Tse

Introduction

In our project, we apply recently developed dimensionality reduction techniques to analyze several types of financial time series, including but not limited to interest rates, exchange rates, stock indices. Because of its applied flavor, our project is heavily involved in implementation work: pre-processing real data, coding a number of algorithms and comparing the strengths and weaknesses of the algorithms in analyzing financial time series of various kind. As a remark, we are interested in this project because our research works are on computational finance.

Related Work

One of the earliest(in 1991) and most well-known application of dimensionality reduction techniques in a financial setting is the use of PCA to show that 97% of the movements in long term interest rates can be explained by three factors.<ref name="PCA"> R Litterman, J Scheinkman; Common factors affecting bond returns; Journal of Fixed Income, 1991</ref>. After this ground-breaking work in 1991, the analysis of financial time series have received a lot of interest in the academia, in the financial industry and in various regulatory bodies.

Despite the development of many advanced techniques for dimensionality reduction over the past two decades, we find almost no other technique other than PCA is used to analyze financial time series in the literature. It's all the more surprising given that PCA has been shown to be quite fruitless in analyzing short term interest rates, exchange rates and stock indices in a technical report published by the U.S. Federal Reserve in 1997<ref name="Fed"> Mico Loretan; Generating market risk scenarios using PCA; Federal Reserve Board, 1997</ref>. In fact few highly cited papers in this research area can be found after 1997. Nevertheless, a survey paper published in 2004<ref name="survey04">Dongsong Zhang, Lina Zhou; Discovering Golden Nuggets: Data Mining in Financial Application; IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 34, NO. 4, NOVEMBER 2004</ref> and a book chapter published in 2008<ref name="chapter08">Stephan K. Chalup and Andreas Mitschele, Kernel Methods in Finance; pp. 655--688, Handbook of Information Technology in Finance; Springer-Verlag, 2008.</ref> show that such applications (if not novel research work) have remained popular in recent years.

Our Proposal

Essentially we hope to improve the results of the 1997 Fed report by applying more advanced techniques for dimensionality reduction covered in this course. We will focus on Kernel PCA, MDS and ISOMAP, as suggested in the 2008 book chapter mentioned above.

The key steps in our project will be
2. Pre-process the financial time series as in the 1997 Fed report.
3. Implement PCA, Kernel PCA, MDS and ISOMAP
4. Apply the algorithms to various types of financial series
5. Compare the performance of the algorithms
6. Optional: extend the analysis to other financial time series, e.g. GDP, unemployment, etc.

As far as the literature indexed by google scholar is concerned, we are unaware that similar work has been published. (though we're much less sure if similar unpublished work has been done in the industry)

<references/>

Project 5: Extending NCA to remove parameters and support hierarchical classes

By: Lachlan Dufton

Neighbourhood Components Analysis provides a non-parametric learning method for finding a distance metric for KNN and for dimensionality reduction. However, using the algorithm still requires tuning of two parameters: the number of neighbours 'k' and the reduced dimensionality 'd'. For this project I will investigate extending NCA to incorporate optimisation of these parameters in the training procedure. The two parameters not only affect classification accuracy, but also classification time and this leads to a multiobjective optimisation problem. The original NCA paper did not provide timing data on training and classification, so I will test NCA timing requirements before and after the extensions, and compare these to other methods such as RCA, PCA and LDA.

Another extension to NCA that I propose is the support for hierarchical classes. When coupled with KNN and dimensionality reduction, this can be useful for creating Voronoi treemaps that retain some measure of ontological distance in the points. This will also allow the method to handle varying degrees of classification depth. For example, in the Gene Ontology project (www.geneontology.org), one gene may be known to be involved with Cell Division, whereas another is known to be involved with Cytokinesis (a part of the cell division process). NCA should be able to handle this hierarchical classification.

Project 6: Learning a distance metric to unfold data by preserving local distances

By: Pooyan Khajehpour

In the literature of metric learning, there are various techniques to learn a distance metric given some side information over data. For example, if the side information contains two sets of similar data pairs and dissimilar data pairs, then there is a technique to learn a distance metric by which the distances between similar points are minimized and meanwhile the distances between dissimilar points are maximized. There is another technique, which provides a distance metric preserving partial side information about some pairwise distances between data points.

In this research the benefits of the aforementioned techniques are combined together to use in dimensionality reduction of data. Assuming our data on a manifold, one can imagine the local distances are the same distances on the manifold, and use this information in the second metric learning technique. Moreover non-local distances can be considered as dissimilarity between data points in the side information of the first technique. We are able to combine and solve both problems together, and as a result, the manifold will be stretched while the local distances will be held constant, and therefore the structure of the manifold will be preserved. We believe stretching the manifold will reduce the dimension of the data over it, and we will study it by experiments.

Project 7 : Supervised Non-negative Matrix Factorization via Dependence Maximization

Introduction

Hilbert-Schmidt Independence Criterion (HSIC) is a criterion successfully used to measure dependence between two multivariate random variables $\,\!x$ and $\,\!y.$<ref> A. Gretton, O. Bousquet, A.J. Smola, and B. Sch¨olkopf. Measuring statistical dependence with Hilbert-Schmidt norms. In S. Jain, H. U. Simon, and E. Tomita, editors, Proceedings Algorithmic Learning Theory, pages 63–77, Berlin, Germany, 2005a. Springer-Verlag.</ref><ref> A. Gretton, A. Smola, O. Bousquet, R. Herbrich, A. Belitski, M. Augath, Y. Murayama, J. Pauls, B. Sch¨olkopf, and N. Logothetis, Kernel constrained covariance for dependence measurement, AISTATS, vol. 10, 2005.</ref> Having a set of observations $(x_1,y_2 ),\cdots,(x_n,y_n )$ and two universal kernels $\,\!k$ and $\,\!l$, an empirical estimate of the HSIC is given by:
$\,\!Tr[HKHL]$
where $H_{n\times n}=I-n^{-1}$ is the centering matrix, $\,\!K(i,j)=k(x_i,y_j)$ and $\,\!L(i,j)=l(x_i,y_j)$.
There are many well-known algorithms which may be interpreted in light if this criterion, including Colored (Supervised) Maximum Variance Unfolding (MVU), Principle Component Analysis (PCA), Supervised PCA, etc. In fact, this method may be effectively used to produce supervised version of many previously known algorithms. That is because, in the above formulation, we may simply set matrix L to be a kernel matrix reflecting the similarity/dissimilarity relationships between the data points. However, two issues should be taken care of here: i) the objective function under consideration should be of the form of a trace function, as presented above. ii) The kernel matrices have to become centered via matrix H.

Our project

In our project, we aim to investigate if it is possible to use the above criterion to propose a supervised version of Non-negative Matrix Factorization (NMF) technique. NMF has turn out to be a very powerful method in different areas of data mining because of its exceptional features including sparsity of the resulting decompositions, meaningful biological outputs, etc. In fact, it is shown that many previously known techniques in the data mining area may be indeed thought of as a special case of NMF<ref> C. Ding, X. He, and H.D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. Proc. SIAM Data Mining Conf, 2005.</ref>. However, in its original form, NMF is a purely unsupervised technique and thus cannot be trained to extract interesting features in different applications.
Based on this, it seems that proposing a supervised version of NMF while keeping its original advantages can be an interesting contribution to the field.

<references/>