# s13Stat946proposal

## Contents

- 1 Project 1: Dimension Reduction for NBA data
- 2 Project 2: Large-Scale Supervised Sparse Principal Component Analysis
- 3 Project 3: Efficient Group-Sparsed Matrix Decompostion
- 4 Project 4: Dimension reduction and metric learning for remote sensing data
- 5 Project 5: A dimensionality reduction technique for time series data

## Project 1: Dimension Reduction for NBA data

### By: Lu Xin and Jiaxi Liang

National Basketball Association (NBA) is one of the biggest sports leagues in North America. Thanks to advanced techniques of data collection, very detailed statistics are available for teams, players, and games throughout the seasons. One important goal of basketball data analysis is to evaluate the performances of teams, lineups and players. The team performances can be easily seen from the rankings of the teams. However, the performance of lineups (5-player combinations) is not so obvious. Furthermore, it is complicated to evaluate players in terms of team playing. For example, Kobe Bryant is certainly a great individual player, but is he a good team player? In this project, we try to apply dimension reduction approaches to deal with such problems. The basic idea is to find low-dimensional representations of teams and lineups. Hopefully, the pattern of the teams and lineups in the latent space can lead to interesting conclusions.

Firstly, we will select dimension reduction approaches by applying them on team statistics. We may consider all possible approaches, linear and nonlinear ones, supervised and unsupervised ones. Since the overall performances of the teams (their ranking) are known, we can choose the methods that yield visualizations that agree with the conclusions drawn from expert knowledge, for instance we expect a clear separation between teams with offensive and defensive styles.

Secondly, we apply the selected methods to lineup data sets and get the plots of the lineups in the low-dimensional space. From the patterns, we may see some interesting team structures. By comparing the lineups with and without certain players, we can tell the effect of such players in the team. We will only focus on important lineups and players.

## Project 2: Large-Scale Supervised Sparse Principal Component Analysis

### By: Wu Lin and Lei Wang and Zikun Xu

One of important issues of dimension reduction technique is scalability when it comes to real-world applications. Recently, there is some published work to address the issue. In the paper[1], the authors proposed a fast and large-scale algorithm to implement Sparse Principal Component Analysis. In our project, we would like to extend this algorithm to supervised version by introducing some supervised metric to the optimization framework such as Hilbert-Schmidt independence criterion (HSIC)

## Project 3: Efficient Group-Sparsed Matrix Decompostion

### By: M.Hassan Z.Ashtiani

Finding a low-rank decomposition of a matrix is a well-known problem and has many applications. Sometimes, there are some additional constraints on this problem. For instance, we may require that the decomposition is sparse. In [2], an efficient method is suggested to solve this problem. In particular, some regularization terms (e.g., L1-norms) are added to the optimization problem to guarantee the sparsity. After employing some relaxations, the problem is casted as a biconvex optimization, and is solved iteratively, similar to the power method (but along with a soft-thresholding operator).

In this project, we want to employ this idea, and develop new algorithms to deal with other type of constraints. Of particular interest, is the group-sparsity constraint on the decomposition, that has applications in feature selection algorithms. The difficulty comes from the fact that the group-sparsity constraint can bind the problem of selecting different basis-vectors together. Therefore, we cannot select basis-vectors one at a time, but we need to select them together. Furthermore, the nature of the constraints (e.g., infinity-norms) makes it difficult to find a closed-form solution, whereas for L1-norm we could find one. These facts make finding an efficient solution hard. In this research, we want to see if we can propose a solution to overcome these difficulties.

In this project, we want to employ this idea, and develop new algorithms to deal with other type of constraints. Of particular interest, is the group-sparsity constraint on the decomposition. The difficulty comes from the fact that the group-sparsity constraint can bind the problem of selecting different basis-vectors together. Therefore, we cannot select basis-vectors one at a time, but we need to select them together. This fact makes finding an efficient solution hard. In this research, we want to see if we can find an efficient method to generalize the approach.

## Project 4: Dimension reduction and metric learning for remote sensing data

### By: Fan Li

As the more and more remote sensing imagery are used for business and research, efficient classification algorithms for remote sensing data are necessary. Some remote sensing data, especially hyperspectral data have much spectral information(hundreds of bands), and the feature dimension will be even more if other features such as texture and shape are included. In this project, dimension reduction and metric learning will be explored for remote sensing data. We want to see if the classification accuracy can be remained after the dimension is reduced by a dimension reduction technique. We also want to see if the KNN classifier can be comparable to other famous classifiers such as kernel SVM and random forests on remote sensing data after using a learned metric.

## Project 5: A dimensionality reduction technique for time series data

### By: Han Sheng Sun

In the project I plan to propose a dimensionality reduction technique to speed up the similarity searches. So far I only have some vague intuition about this, such as, in techniques like SNE, we minimizes the K-L distances between the distribution of high dimensional data points X and low dimensional data points Y, so finding a similarity measures between the time series data that could still faithfully representing the underlying information technique may still be preserved, i feel the first is to do some vector quantization and translate the time series into matrix form, then some of the dimensionality reduction technique discussed in this class could be applied. At this stage, I have no solid ideas, will update the proposal further as it goes on.