s13Stat946proposal
Project 1 : How to Build a Bird House
By: Maseeh Ghodsi, Soroush Ghodsi and Ali Ghodsi
Project: Dimension Reduction for NBA data
By: Lu Xin and Jiaxi Liang
National Basketball Association (NBA) is one of the biggest sports leagues in North America. Thanks to advanced techniques of data collection, very detailed statistics are available for teams, players, and games throughout the seasons. One important goal of basketball data analysis is to evaluate the performances of teams, lineups and players. The team performances can be easily seen from the rankings of the teams. However, the performance of lineups (5-player combinations) is not so obvious. Furthermore, it is complicated to evaluate players in terms of team playing. For example, Kobe Bryant is certainly a great individual player, but is he a good team player? In this project, we try to apply dimension reduction approaches to deal with such problems. The basic idea is to find low-dimensional representations of teams and lineups. Hopefully, the pattern of the teams and lineups in the latent space can lead to interesting conclusions.
Firstly, we will select dimension reduction approaches by applying them on team statistics. We may consider all possible approaches, linear and nonlinear ones, supervised and unsupervised ones. Since the overall performances of the teams (their ranking) are known, we can choose the methods that yield visualizations that agree with the conclusions drawn from expert knowledge, for instance we expect a clear separation between teams with offensive and defensive styles.
Secondly, we apply the selected methods to lineup data sets and get the plots of the lineups in the low-dimensional space. From the patterns, we may see some interesting team structures. By comparing the lineups with and without certain players, we can tell the effect of such players in the team. We will only focus on important lineups and players.
Project: Large-Scale Supervised Sparse Principal Component Analysis
By: Wu Lin and Wei Wang
One of important issues of dimension reduction technique is scalability when it comes to real-world applications. Recently, there is some published work to address the issue. In the paper[], the authors proposed a fast and large-scale algorithm to implement Sparse Principal Component Analysis. In our project, we would like to extend this algorithm to supervised version by introducing some dependence metric to the optimization framework such as Hilbert-Schmidt independence criterion (HSIC)