proposal Fall 2010: Difference between revisions

From statwiki
Jump to navigation Jump to search
m (Conversion script moved page Proposal Fall 2010 to proposal Fall 2010: Converting page titles to lowercase)
 
(13 intermediate revisions by 5 users not shown)
Line 1: Line 1:
==Project 1 : Classifying New Data Points Using An Outlier Approach ==
==Project 1 : Classifying New Data Points Using An Outlier Approach ==
</noinclude>
</noinclude>
Line 72: Line 70:


Using the datasets from UCI, It shows that ensembles generated by random projection perform better than those by PCA and further that this can be attributed to the capability of random projection to produce diverse base clustering. It has also shown that a recent consensus function based on bipartite graph partitioning achieves the best performance.
Using the datasets from UCI, It shows that ensembles generated by random projection perform better than those by PCA and further that this can be attributed to the capability of random projection to produce diverse base clustering. It has also shown that a recent consensus function based on bipartite graph partitioning achieves the best performance.
==Project 5 : Texture Classification Using Compressive Sensing ==
==Project 5 : Observation Conditions to Localization Accuracy Association ==
</noinclude>
<b>By: Mohammad Rostami</b>
 
Analysis of textures has many potential applications such as: remote sensing, biomedical image analysis and surface inspection. One of the major tasks in this area is texture classification. A major problem in texture analysis is that the natural textures are not often uniform, due to variations in scale, orientation, or other visual appearances which makes it harder to work with textures as compared to natural images. We can generally model natural images as deterministic signals while statistical models are more successful for texture synthesis.
Though, many successful methods have been proposed for texture classifications [1], which have solved the fore mentioned problems considerably, but there is still need to develop better algorithms. Similar to other classification problems we will have a dimension reduction step. This step is necessary to make the feature vector invariant to the above mentioned variations. Many methods have been proposed in the literature to achieve this task. Most commonly, some kind of transform is used to map the signal to lower dimension space e.g.: wavelets, FDA, and statistical methods. Recently, Compressive sensing (CS) has also been used for this very purpose [2].
Compressive sensing is a technique for finding sparse solutions to underdetermined linear systems [3]. Consider a linear system with more unknowns than equations.
:<math>
\begin{align}
\mathbf{Y_{n*1}} = \Phi_{n*m}\mathbf{X_{m*1}}                       
\end{align}
</math>
It is obvious that this problem does not have a unique solution but it can be shown if we assume sparsity as the prior knowledge about the solution, we can end up calculating a unique answer. This means that we can specify a sparse signal uniquely in lower dimensions. This demonstrates that CS has the potential to be used as dimension reduction. In contrast to classic CS where it is required to solve (1), here, instead we would like to design  in order to transform our data to lower dimension space with the possibility of retrieving it uniquely. The main problem is that textures are not sparse signals so before doing this step we need to find a method to transform textures to sparse signals. After this step we can use existing methods for classification purpose. 
In this project I plan to carry out an extensive overview of existing texture classification methods, compressive sensing and the connection between two. I would survey current methods that use CS as a tool for classification and compare them in various aspects. Besides, I will also implement a novel idea to perform classification task similar to the proposed method in [2]. Principally, I hope to achieve better results by taking advantage of texture signal properties.
 
[1] T. Ojala, M. Pietikainen, T. Maenpaa: “Multi resolution gray-scale and rotation Invariant texture classification with local binary patterns” IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002)971–987
 
[2] L. Lui, P. Fieguth “Texture classification using compressed sensing.” 2010 Canadian Conference on Computer and Robot Vision (CRV)
[3] D. Donoho. “Compressed sensing,” IEEE Transactions on Information Theory, vol.52, pp.1289-1306, 2006.
 
==Project 6 : Observation Conditions to Localization Accuracy Association ==
</noinclude>
</noinclude>
<b>By: Haitham Amar </b>
<b>By: Haitham Amar </b>
Line 135: Line 112:
•Now we are working on extracting the features form the raw data collected by the GPS.
•Now we are working on extracting the features form the raw data collected by the GPS.


==Project 7 : Face Recognition Using Kernel Fisher Linear Discriminant Analysis and RBF Neural Network ==
==Project 6 : Face Recognition Using Kernel Fisher Linear Discriminant Analysis and RBF Neural Network ==
</noinclude>
</noinclude>
<b>By: Ahmed Ibrahim</b>
<b>By: Ahmed Ibrahim</b>
Line 197: Line 174:
[[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1004157&isnumber=21655 Click Here]]
[[http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1004157&isnumber=21655 Click Here]]


==Project 8 : Application of Machine Learning to Epileptic Seizure Detection ==
==Project 7 : Application of Machine Learning to Epileptic Seizure Detection ==
</noinclude>
</noinclude>
<b>By: Hanna Kazhamiaka</b>
<b>By: Hanna Kazhamiaka</b>
Line 207: Line 184:




==Project 9 : Stable Signal recovery From Incomplete and Inaccurate Measurements==
==[http://www-stat.stanford.edu/~candes/papers/StableRecovery.pdf Project 8 : Stable Signal recovery From Incomplete and Inaccurate Measurements] ==
</noinclude>
</noinclude>
<b>By: Azim Ansari, Fei Wang, Xin Xiong </b>
<b>By: [http://www.math.uwaterloo.ca/navigation/CompMath/Masters/GraduateStudents.shtml Azim Ansari, Fei Wang, Xin Xiong] </b>




Line 217: Line 194:
We will try to use the algorithm in the paper “Stable Signal recovery From Incomplete and Inaccurate Measurements” to recover an image by an incomplete and contaminated observation.
We will try to use the algorithm in the paper “Stable Signal recovery From Incomplete and Inaccurate Measurements” to recover an image by an incomplete and contaminated observation.


Step 01
<b>Step 01</b>
We choose an image and change it into a vector x0.
* We choose an image and change it into a vector x0.




Step 02
<b>Step 02</b>
We design a matrix A which satisfies some special principal and take A*x0, that means that we make an incomplete observation of x0. (the image)
* We design a matrix A which satisfies some special principal and take A*x0, that means that we make an incomplete observation of x0. (the image)
We will apply some classification algorithms to the incomplete information A*x0 and compare the results produced by original ones.
We will apply some classification algorithms to the incomplete information A*x0 and compare the results produced by original ones.




Step 03  
<b>Step 03</b>
We add an error term e (a noise) to A*x0, have y=A*x0+e as our Input. (The incomplete and contaminated information form the image vector x0)
* We add an error term e (a noise) to A*x0, have y=A*x0+e as our Input. (The incomplete and contaminated information form the image vector x0)




Step 04
<b>Step 04</b>
Finally, we try to recover the image from our input y and get y0 and our recovered image vector, and then we compare y0 and x0.
* Finally, we try to recover the image from our input y and get y0 and our recovered image vector, and then we compare y0 and x0.


==Project 10 : Understanding affective expressions in human hand movements ==
==Project 9 : Understanding affective expressions in human hand movements ==
<b>By: Ali-Akbar Samadani </b><br />
<b>By: Ali-Akbar Samadani </b><br />


Line 251: Line 228:
This study potentially will lead to the development of appropriate '''metric spaces''' through which a notion of similarity for expressive hand movement can be defined for quantitative comparison and classification of these movements.
This study potentially will lead to the development of appropriate '''metric spaces''' through which a notion of similarity for expressive hand movement can be defined for quantitative comparison and classification of these movements.


==Project 11 : Use of Neural Networks and Genetic Algorithms on Demographic Data for Targeted Advertising ==
==Project 10 : Use of Neural Networks and Genetic Algorithms on Demographic Data for Targeted Advertising ==
</noinclude>
</noinclude>
<b>By: Laura Chelaru, Meng Lu, Nicholas Church</b>
<b>By: Laura Chelaru, Meng Lu, Nicholas Church</b>
Line 263: Line 240:
In addition, we will also review the classification techniques generally used in demographic-based targeted advertising, and we will put foward an opinion on what new advances in classification methodology are being most likely to be used in the future for this application.
In addition, we will also review the classification techniques generally used in demographic-based targeted advertising, and we will put foward an opinion on what new advances in classification methodology are being most likely to be used in the future for this application.


==Project 12 : Exploration of Shallow Parsing techniques==
==Project 11 : Exploration of Shallow Parsing techniques==
</noinclude>
</noinclude>
<b>By: Kaheer Suleman, Frank Thomas </b>
<b>By: Kaheer Suleman, Frank Thomas </b>
Line 272: Line 249:




==Project 13 : Solving overfitting problem in Feed Forward Neural Network using Stein's unbiased risk estimator.==
==Project 12 : Parameter selection for smoothing splines using Stein's unbiased risk estimator.==


</noinclude>
</noinclude>
<b>By: Sepideh Seifzadeh </b>
<b>By: Sepideh Seifzadeh and Mohammad Rostami </b>
 
In the class we learned how to apply Stein's unbiased risk estimator
(SURE) to RBF network in order to control the complexity and
minimize the the true error. We could determine the optimal value of
number of basis functions. While the approach had been designed for
RBF network we can easily extend this method to any linear model.
Linear models are a class of methods that try to learn decomposition
of a function in the linear space of some basis function. Cubic
splines are a class of linear models with lots of applications in
computer graphics. In this model, our basis functions are <math>\,3^{rd}</math>
order polynomials. We have similar structure as RBF network in this
problem. Smoothing splines are a special form of splines in which
the spline is smooth; which means that our function has continuous
second order derivative. Smoothness of function is controlled by a
parameter <math>\,(\lambda)</math>. and there is trade off between accuracy of the
estimated function and its smoothness. In this project we try to
apply SURE in order to find the optimal value of smoothness factor.
 
We will take advantage of SURE to minimize the true error in our
model.
 
 
<math>\,\hat{f}</math> which is prediction model is defined by:
<math>\,\hat{f}=S_{\lambda}y</math>
 
<math>\,S_{\lambda}= N(N^TN+\lambda\Omega_{N})^{-1}N^{T}</math>
 
<math>\,err=Err+n\sigma^{2}-2\sigma^{2}\sum[\frac{\sigma \hat{f}}{\sigma
y}]</math>
 
 
Where err is empirical error and Error is the true error.
 


In the class we learned how to apply Stein's unbiased risk estimator (SURE) to RBF network in order to control the complexity. We could determine the optimal value of number of basis functions.
Derivative of <math>\,\hat{f}</math> w.r.t <math>\,y</math>
We face overfitting in Feed Forward Neural Network (FFNN)  as well. For this reason we need somehow to control the complexity in MLP networks, too. In contrast to RBF, besides the structure (i.e. number of layers and number of nodes in each layer), we face overfitting more dominantly in training procedure. When the number of iterations increases more than an optimal value, we face overtraining, which is a kind of overfitting. Since the weights are trained sensitive to training data, therefore; they can not be generalized well for test data. In order to avoid this overfitting, we have to avoid increscent of the number of iterations more than optimal value. In fact; we do not know the exact number of the iterations, due to this fact current methods choose the number of iterations heuristically.


In this project we try to solve this problem in MPL networks using SURE. We try to change back propagation rule in order to avoid overtraining even when we pass the optimal value of number of iterations. We try to change the update rule of weights by entering a term that avoids overtrarining. We will take advantage of SURE to control the complexity in Feed Forward Neural Network.


==Project 14 : Identification of related single-nucleotide polymorphism (SNPs) to a certain disease==
<math>\,\frac{\sigma \hat{f}}{\sigma y}=trace(S_{\lambda})</math>
 
Therefore:
 
<math>\,err=Err+n\sigma^{2}-2\sigma^{2} trace(S_{\lambda})</math>
 
 
<math>\,err=Err+n\sigma^{2}-2\sigma^{2}</math>
<math>\,trace(N(N^TN+\lambda\Omega_{N})^{-1}N^{T})</math>
 
We find the optimum <math>\,\lambda</math> which minimizes the error.
 
==Project 13 :Finding Informative Gene based on ICA, R1D and Bayes Classifier ==


</noinclude>
</noinclude>
<b>By: Fatemeh Dorri </b>
<b>By: Fatemeh Dorri </b>


The reason of complex diseases is not obvious for scientists. It seems that they are related to the lifestyle meanwhile genetic is also an effective factor. Population-based genetic association studies offer a powerful approach to identify the multiple genetic variants. But it needs genotyping a sufficient number of SNPs. The more available SNPs data, the more accurate result.
Recently, independent component analysis has become a popular method for finding the independent variables that are the main feature of data. ICA is basically an unsupervised algorithm that looks original basis components that are independent. There is another method which is called Principal Component Analysis (PCA). PCA is also looking for the main components of the data which are uncorrelated. Unlike PCA that makes the second order statistical information independent, ICA makes high-order statistical independence. But they are similar in the way that both ICA and PCA express linear representation of data and they are not capable of nonlinear cases in their original algorithm. ICA and projection pursuit method are similar in a sense that directions which have maximally “non-Gaussian” distribution are our interest, since these projections are more useful for classification. The results of ICA algorithm will be promising if the independence assumption is right and all the components have not Gaussian distribution.
 
There are different methods which tried to show the relation among different SNPs and trait. But this problem is still open and challenging. In this project we are going to find out a novel method for extracting appropriate features. So the clustering leads to a subset of SNPs which have a high relationship with trait.
 
First, we are going to check different machine learning method by WEKA[http://en.wikipedia.org/wiki/Weka_%28machine_learning%29]. It is a good software for evaluating the first ideas whether they are promising or not. then we will try to customize the best one based on our criteria.


In this problem we need a feature selection method and then a clustering algorithm to select effective SNPs in a certain disease or trait.
The project is a sparse decomposition of independent components analysis (ICA) mixing matrix based on Rank-one Downdate (R1D).  Having new features based on those two algorithms, we would find out the outstanding features using Bayes classifier.


==Project 15: Computation and Bounds of the Littlestone Dimension==
==Project 14: Computation and Bounds of the Littlestone Dimension==
<b>By: Erik Louie </b>
<b>By: Erik Louie </b>


Line 315: Line 331:
3. Shai Ben-David, David Pal and Shai Shalev-Shwartz. Agnostic Online Learning. COLT, 2009.
3. Shai Ben-David, David Pal and Shai Shalev-Shwartz. Agnostic Online Learning. COLT, 2009.


==Project 16: Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm==
==Project 15: Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm==
<b>By: Jenna Voisin </b>
<b>By: Jenna Voisin </b>


Line 323: Line 339:




==Project 17 : To be confirmed ==
==Project 16 : Classifier Considering Distribution Gap between Training Set and Test Set ==
<b>By: Dan Xie, Xiaohui Wang</b>
<b>By: Dan Xie, Xiaohui Wang</b>


<br>
<br>
To be confirmed
Most machine learning algorithms are constructed under the assumption that the training and the test data are drawn from the same distribution.
 
However, in practice we are very often faced with the situation where the distributions of the training and the test data differ.
 
In the article "A Kernel Method for the Two-Sample-Problem", the authors define the maximum mean discrepancy (MMD) to measure the similarity between two sets with different distributions. We decide to get a classifier based on this measurement.


<br>
<br>
<br>
<br>

Latest revision as of 08:45, 30 August 2017

Project 1 : Classifying New Data Points Using An Outlier Approach

By: Yongpeng Sun


Intuition:

In LDA, we assign a new data point to the class having the least distance to the center. At the same time however, it is desirable to assign a new data point to a class so that it is less of an outlier in that class as compared to every other class. To this end, compared to every other class, a new data point should be closer to the center of its assigned class and at the same time also, after suitable weighting has been done, be closer to the directions of variation of its assigned class.


Suppose there are two classes 0 and 1 both having [math]\displaystyle{ \,d }[/math] dimensions, and a new data point is given. To assign the new data point to a class, we can proceed using the following steps:

Step 1: For each class, find its center and its [math]\displaystyle{ \,d }[/math] directions of variation.


Step 2: For the new data point, with regard to each of the two classes, sum up the point's distance to the center and the point's distance to each of the [math]\displaystyle{ \,d }[/math] directions of variation weighted (multiplied) by the ratio of the amount of variation in that direction to the total variation in that class.


Step 3: Assign the new point to the class having the smaller of these two sums.


These 3 steps can be easily generalized to the case where the number of classes is more than 2 because, to assign a new data point to a class, we only need to know, with regard to each class, the sum as described above.


I would like to evaluate the effectiveness of my idea / algorithm as compared to LDA and QDA and other classifiers using data sets in the UCI database ( http://archive.ics.uci.edu/ml/ ).



Project 2: Apply Hadoop Map-Reduce to a Classification Method

By: Maia Hariri, Trevor Sabourin, and Johann Setiawan

Develop map-reduce processes that can properly classify large distributed data sets.

Potential projects:

1. Use Hadoop Map-Reduce to implement the Support Vector Machine (Kernel) classification algorithm.
2. Use Hadoop Map-Reduce to implement the LDA classification algorithm on a novel problem (e.g. forensic identification of handwriting.)


Project 3 : Hierarchical Locally Linear Classification

By: Pouria Fewzee

Extension of an intrinsic two-class classifier to a multi-class may be challenging, as the common approaches either remain some vague areas in the feature space, or are computationally inefficient. One may found linear classifier and support vector machines two well-known instances of intrinsic two-class classifiers, and the k-1 and k(k-1)/2-hyperplanes as two most common approaches for extension of their capabilities to multi-class tasks. The k-1 bothers from leaving vague areas in the feature space and even the k(k-1)/2 does not have this problem, it is not computationally efficient. Hierarchical classification is proposed as a solution. This not only improves the efficiency of the classifier, but also the suggested tree could provide the specialists with new outlooks in the field.

To build a general purpose classifier which adapts to different patterns, as much as demanded, is another purpose of this project. To realize this goal, locally linear classification is proposed. Performing the locality in classifier design is accomplished by means of utilizing a combination of fuzzy computation tools along with binary decision trees.


Project 4 : Cluster Ensembles for High Dimensional Clustering

By: Chun Bai, Lisha Yu

Clustering for unsupervised data exploration and analysis has been investigated for decades in machine learning. Its performance is directly influenced by the dimensionality. Data with high dimensionality pose two fundamental challenges for clustering algorithms. First, the data tend to be sparse in a high dimensional space. Second, there often exist noisy features that may mislead clustering algorithm.

The paper studies cluster ensembles for high dimensional data clustering. Three different approaches to constructing cluster ensembles are examined:

1. Random projection based approach
2. Combining PCA and random subsampling
3. Combing random projection with PCA

Moreover, four different consensus function for combing the clustering of the ensemble are examined:

1. Consensus Functions Using Graph Partitioning
-Instance-Based Graph Formulation (IBGF)
-Cluster-Based Graph Formulation (CBGF)
-Hybrid Bipartite Graph Formulation (HBGF)
2. Consensus Function Using Centroid-based Clustering (KMCF)

Using the datasets from UCI, It shows that ensembles generated by random projection perform better than those by PCA and further that this can be attributed to the capability of random projection to produce diverse base clustering. It has also shown that a recent consensus function based on bipartite graph partitioning achieves the best performance.

Project 5 : Observation Conditions to Localization Accuracy Association

By: Haitham Amar


Vehicle localization is a key issue that has recently attracted a significant amount of attention in a wide range of applications. Navigation, vehicle tracking, Emergency Calling (eCall) and Location Based Services (LBS) are examples of emerging applications that have a great demand for location information. Indeed, the Global Positioning System (GPS) has been the de facto standard solution for the vehicle localization problem. Nevertheless, GPS based localization is inaccurate and unreliable due to GPS' inherent positional errors such as poor performance in vertical positioning and the prevalent horizontal movement, in addition to anomalies caused by line-of-sight occlusions and multipath issues in urban canyons.


It is well recognized in the literature that the performance of GPS receivers has a stochastic behavior, which is influenced by the observation conditions. For example, localization accuracy is high in open sky environments; however, in the presence of high rise buildings the localization accuracy is low and sometimes it is hard to be defined. Moreover, the GPS satellite signals may vanish if the vehicle goes through underpass or tunnel. The deficiency of obtaining consisting localization accuracy cannot be tolerated by many applications. Therefore, recent pieces of research work have attempted to evaluate the localization performance of various positioning techniques as a first step of improving the performance.


Since GPS technology is a crucial component in most of the vehicle localization techniques, the focus of this project will be on the classification of the performance of a GPS receiver while monitoring certain parameters sensitive to the observation conditions. In the literature, the sensitivity of two parameters (namely, the Signal to Noise Ratio of the received signal (SNR) from the GPS satellites and the Dilution of Precision (DOP) value has been investigated. Conceivably, the SNR is sensitive to the local environment of the receiver (High-rise buildings, trees, open sky, etc.). However, the DOP is reflecting the goodness of the geometric arrangement of the GPS satellites used as reference points in the localization process. Nevertheless, by looking at the figures of the SNR and DOP and comparing them with the localization errors, in many cases it is not trivial to draw a mapping function or classifier that can indicate the performance of the receiver.


Objectives of the project:

•Introducing more features similar to SNR and DOP, such as number of satellites used in the localization process, the mean and the variance of the SNR, the change in the satellites’ constellation, the speed of the vehicle, etc. These features are expected to support the process of discriminate analysis.

•Constructing a rich learning data base for GPS receiver measurements.

•Implementing different classification techniques to classify a number of performance margins for the GPS localizations. These classification techniques may not be limited to the ones we have been taught in the course.

•Studying the sensitivity of the classification techniques to the features that will be introduced.


Challenges of the project:

•A sufficient GPS data need to be collected in different environment conditions.

•Specifying the GPS performance margins that could be provided by the receiver in the different environment conditions.

•Despite our enthusiasm towards this research work, there is still a dark side of this new experimental work in terms of obtaining no major contribution as appose to the time spend on the investigation.


Current status:

•A GPS receiver is already under our hands, which will allow us to collect as much data as we need.

•The communication with the hardware is already setup and we were able to capture some date on campus in various environment condition.

•Now we are working on extracting the features form the raw data collected by the GPS.

Project 6 : Face Recognition Using Kernel Fisher Linear Discriminant Analysis and RBF Neural Network

By: Ahmed Ibrahim

Problem Description

Face recognition is one of the important areas in the fields of pattern recognition, computer vision and machine learning. It is used in wide range of applications such as credit cards, passport, biometrics, law enforcement, identity authentication and surveillance. Ideally a face detection system should be able to take a new face and return a name identifying that person. Statistically, faces can also be very similar. The similarities between faces provide a way to an identification approach that uses the full face. A system can be built that looks at the statistical relationship of individual pixels. One person may have a larger distance between his or her eyes then another, so two regions of pixels will be correlated to one another differently for image sets of these two people. Characterizing the dependencies between pixel values becomes a statistical problem. The eigenface technique finds a way to create ghost-like faces that represent the majority of variance in an image database. We are trying to implement the novel approach [1] that handles the problem of face recognition by takes advantage of these similarities between faces to create a fairly accurate and computationally by applying Principal Component Analysis (PCA), Kernel Fisher’s Linear Discriminant Analysis (KFLDA) and Radial Basis Function Neural Network (RBFNN).

Proposed Approach

A new face recognition method is presented based on Principal Component Analysis (PCA), Kernel Fisher’s Linear Discriminant Analysis (KFLDA) and Radial Basis Function Neural Network (RBFNN). Using the following steps:

1. PCA technique is applied for reducing the facial image dimensions.

2. Hence, the KFLDA is used for extraction of most discriminating features in appearance-based face recognition.

3. KFLDA provides better generalizations taking higher order correlations into account rather than FLDA, which projects directions, based on second order statistics.

4. As a classifier, RBFNN is used which classifies the face images based on extracted features from the previous step.

Tools

The implementation will be developed on a MATLAB R2009a platform. Using MATLAB makes the development efforts focused on the algorithm itself.

Implementation Constraints

The implementation will operate on the following assumptions: There must be a prior knowledge about the background and it must be stable. The camera will be fixed.

Dataset

In this project, we will used the ORL database which is known as AT&T "The Database of Faces". This dataset has ten different images of each of 40 distinct subjects. The images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Timeline

Week 1: Implementing the KFLDA algorithm. Extracting spatial-temporal features using PCA and KFLDA.

Week 2: Quantizing features to derive vocabulary for the model. Designing and Implementing the RBFNN network.

Week 3: Performing classification and recognition experiments on ORL dataset.

Week 4: Comparing results with current state-of-art research results.

Week 5: write-up final report with complete details of the algorithms, experiments and results.

References

1. Sweta Thakur, Jamuna Kanta Sing, Dipak Kumar Basu, Mita Nasipuri: Face Recognition Using Kernel Fisher Linear Discriminant Analysis and RBF Neural Network. Contemporary Computing: Third International Conference, IC3 2010 / Noida, India, August 9-11, 2010 / Proceedings, Part I (Communications in Computer and Information Science 94, 2010: 13-20 [Click Here]

2. Qingshan Liu; Xiaoou Tang; Hanqing Lu; Songde Ma; , "Face recognition using kernel scatter-difference-based discriminant analysis," Neural Networks, IEEE Transactions on , vol.17, no.4, pp. 1081- 1085, July 2006 [Click Here]

3. Qingshan Liu; Rui Huang; Hanqing Lu; Songde Ma; , "Face recognition using Kernel-based Fisher Discriminant Analysis," Automatic Face and Gesture Recognition, 2002. Proceedings. Fifth IEEE International Conference on , vol., no., pp.197-201, 21-21 May 2002 [Click Here]

Project 7 : Application of Machine Learning to Epileptic Seizure Detection

By: Hanna Kazhamiaka

One application of machine learning approaches in the biomedical field is the construction of seizure detection algorithms for epileptic patients. Shoeb and Guttag propose a patient-specific classifier to detect the onset of a seizure through analysis of scalp EEG, in Application of Machine Learning To Epileptic Seizure Detection. The objective is to achieve accurate detection by using an automated process for creating the feature vector. The time evolution of both spectral and spatial properties of the brain’s electrical activity is combined in a single feature space, which means that there is no need for an expert to manually combine the features. This is a binary classification problem: seizure and non-seizure activity is classified using a support vector machine with non-linear decision boundaries generated by an RBF kernel. In this project, I will apply this detection algorithm to a data set consisting of 916 hours of continuous scalp EEG activity from 23 pediatric patients, in attempt to reproduce the results of the paper.



Project 8 : Stable Signal recovery From Incomplete and Inaccurate Measurements

By: Azim Ansari, Fei Wang, Xin Xiong


We have selected a paper about compressive sensing and we have planned to reproduce the result of it. Also, we are planning to implement a novel idea by classification via incomplete abservation .Following is our proposal summary:

We will try to use the algorithm in the paper “Stable Signal recovery From Incomplete and Inaccurate Measurements” to recover an image by an incomplete and contaminated observation.

Step 01

  • We choose an image and change it into a vector x0.


Step 02

  • We design a matrix A which satisfies some special principal and take A*x0, that means that we make an incomplete observation of x0. (the image)

We will apply some classification algorithms to the incomplete information A*x0 and compare the results produced by original ones.


Step 03

  • We add an error term e (a noise) to A*x0, have y=A*x0+e as our Input. (The incomplete and contaminated information form the image vector x0)


Step 04

  • Finally, we try to recover the image from our input y and get y0 and our recovered image vector, and then we compare y0 and x0.

Project 9 : Understanding affective expressions in human hand movements

By: Ali-Akbar Samadani

Humans infer and ascribe affective meaning to observed motions even if none is intended. In this work, affective expression demonstrated through human hand motions will be studied using a labeled expressive hand movement dataset. The Human hand is the most dexterous of the human limbs and is capable of demonstrating a wide range of gestural movements, through which a rich source of information about the feeling and intention of their demonstrator is conveyed. Furthermore, human hand gestures can be highly variable, even when performed by the same person (e.g. different instances of the same expressive movement performed by a single human can be different even though intended to convey the same expression). Therefore, it is necessary to identify inherent movement qualities associated with different classes of expressive movements. This project aims to identify a subset of features or correlation of features whose presence preserves the semantics or perception of the affect conveyed in the movement, while conversely, their absence or any change in them might result in misperception of the movement affect.

Summary of the proposed project is described below:
  • Understanding affective expression in high-dimensional hand movements and characterization of expressive hand movements:
    • Dimensionality reduction techniques map the data to lower dimensional subspaces spanned by a subset of features or transformation of features in which sufficient information exist to do accurate classification of different classes of expressive movements.
      • Literature review on Sufficient dimensionality reduction
    • Feature selection: Relevant features.
      • e.g. Forward selection/backward elimination, Lasso’s approach.
    • Feature extraction:Relevant feature correlation.
      • e.g. PCA, LLE, HSIC (a kernel dimensionality reduction), STIsomap (A spatio-temporal extension to Isomap nonlinear dimension reduction).
    • Evaluation
      • In order to evaluate the outcome of the feature extraction and feature selection techniques, statistical classification will be performed on the labeled dataset and hypothesis test will be done to evaluate the level of representativeness of each subset of features or feature correlation found in the previous steps based on the classification results.


This study potentially will lead to the development of appropriate metric spaces through which a notion of similarity for expressive hand movement can be defined for quantitative comparison and classification of these movements.

Project 10 : Use of Neural Networks and Genetic Algorithms on Demographic Data for Targeted Advertising

By: Laura Chelaru, Meng Lu, Nicholas Church

We have found a paper that claims to improves upon traditional demographic-targeted advertising methods by putting forward a classification technique that combines genetic algorithms and neural network learning. In brief, a genetic algorithm is used to pre-select the features that are learned by the neural network. The paper compares their classification technique with a more traditional method based on a PCA/logit rule. This paper was published in a 2005 issue of Management Science and can be accessed here.

The paper uses demographic data collected from customers of an insurance company, and the data is publically-available.

Classification methods used in this paper are not trivial, but we will attempt to reproduce their results by implementing the algorithms that they put foward. We will test our implementation on the same data that the paper used, and we will discuss any difficulties involved with implementing their algorithms.

In addition, we will also review the classification techniques generally used in demographic-based targeted advertising, and we will put foward an opinion on what new advances in classification methodology are being most likely to be used in the future for this application.

Project 11 : Exploration of Shallow Parsing techniques

By: Kaheer Suleman, Frank Thomas

Shallow Parsing (or text chunking) involves the decomposition of a sentence of text into its constituent parts. These parts could be nouns, verbs, and etcetera. Traditionally, this problem has been approached using Support Vector Machines (for supervised learning) and Expectation Maximization (for unsupervised learning).

We intend on exploring this problem and implementing an algorithm to solve it.


Project 12 : Parameter selection for smoothing splines using Stein's unbiased risk estimator.

By: Sepideh Seifzadeh and Mohammad Rostami

In the class we learned how to apply Stein's unbiased risk estimator (SURE) to RBF network in order to control the complexity and minimize the the true error. We could determine the optimal value of number of basis functions. While the approach had been designed for RBF network we can easily extend this method to any linear model. Linear models are a class of methods that try to learn decomposition of a function in the linear space of some basis function. Cubic splines are a class of linear models with lots of applications in computer graphics. In this model, our basis functions are [math]\displaystyle{ \,3^{rd} }[/math] order polynomials. We have similar structure as RBF network in this problem. Smoothing splines are a special form of splines in which the spline is smooth; which means that our function has continuous second order derivative. Smoothness of function is controlled by a parameter [math]\displaystyle{ \,(\lambda) }[/math]. and there is trade off between accuracy of the estimated function and its smoothness. In this project we try to apply SURE in order to find the optimal value of smoothness factor.

We will take advantage of SURE to minimize the true error in our model.


[math]\displaystyle{ \,\hat{f} }[/math] which is prediction model is defined by: [math]\displaystyle{ \,\hat{f}=S_{\lambda}y }[/math]

[math]\displaystyle{ \,S_{\lambda}= N(N^TN+\lambda\Omega_{N})^{-1}N^{T} }[/math]

[math]\displaystyle{ \,err=Err+n\sigma^{2}-2\sigma^{2}\sum[\frac{\sigma \hat{f}}{\sigma y}] }[/math]


Where err is empirical error and Error is the true error.


Derivative of [math]\displaystyle{ \,\hat{f} }[/math] w.r.t [math]\displaystyle{ \,y }[/math]


[math]\displaystyle{ \,\frac{\sigma \hat{f}}{\sigma y}=trace(S_{\lambda}) }[/math]

Therefore:

[math]\displaystyle{ \,err=Err+n\sigma^{2}-2\sigma^{2} trace(S_{\lambda}) }[/math]


[math]\displaystyle{ \,err=Err+n\sigma^{2}-2\sigma^{2} }[/math] [math]\displaystyle{ \,trace(N(N^TN+\lambda\Omega_{N})^{-1}N^{T}) }[/math]

We find the optimum [math]\displaystyle{ \,\lambda }[/math] which minimizes the error.

Project 13 :Finding Informative Gene based on ICA, R1D and Bayes Classifier

By: Fatemeh Dorri

Recently, independent component analysis has become a popular method for finding the independent variables that are the main feature of data. ICA is basically an unsupervised algorithm that looks original basis components that are independent. There is another method which is called Principal Component Analysis (PCA). PCA is also looking for the main components of the data which are uncorrelated. Unlike PCA that makes the second order statistical information independent, ICA makes high-order statistical independence. But they are similar in the way that both ICA and PCA express linear representation of data and they are not capable of nonlinear cases in their original algorithm. ICA and projection pursuit method are similar in a sense that directions which have maximally “non-Gaussian” distribution are our interest, since these projections are more useful for classification. The results of ICA algorithm will be promising if the independence assumption is right and all the components have not Gaussian distribution.

The project is a sparse decomposition of independent components analysis (ICA) mixing matrix based on Rank-one Downdate (R1D). Having new features based on those two algorithms, we would find out the outstanding features using Bayes classifier.

Project 14: Computation and Bounds of the Littlestone Dimension

By: Erik Louie

In statistical/machine learning, the concept of learnability has been defined rigorously. Learnability has been shown to be equivalent to having a finite Vapnik-Chervonenki's dimension (vcdim) [1]. Mistake bounds (mb), the maximum number of mistakes a learning algorithm can make over a data set of size n, has been shown to be equivalent to the Littlestone dimension (ldim) [3]. The Littlestone dimension is the maximum rank, largest full subtree, of trees representing the path of learning. The vcdim has been shown to be bounded above by mb.

The goal of this project is twofold:

  • To analyze the properties of ldim
  • To analyze alternative methods of computating the vcdim, mb, and ldim

For example, the expected regret has been shown to be bounded below by ldim [3]. Since ldim has been shown to be directly related to a variety of concepts in machine learning. The vcdim has been shown to be computationally difficult [2], is it possible to compute a good bound on the ldim efficiently? In this project I will analyze possible optimization methods for ldim.

References

1. Blumer, Anselm, et al. Journal of the Association for Computing Machinery. Vol. 36. No. 4. October 1989, pp. 929-965.

2. Papadimitriou, Christos H., Mihalis Yannakakis. Structure in Complexity Theory Conference. IEEE. May 1993, pp. 12-18.

3. Shai Ben-David, David Pal and Shai Shalev-Shwartz. Agnostic Online Learning. COLT, 2009.

Project 15: Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm

By: Jenna Voisin

Graphical models are being used more and more in practice to model conditional-independence relationships through directed acyclic graphs (DAGs). These models are used for prediction and classification as well as assessing any association or causality relationships that may exist in the experimental observations. Thus in addition to controlling or knowing the error rate of the prediction or classification capabilities of the model it is also of interest to quantify the error rate of the association and causality relationships, yet still generate a model that adequately fits the data. One such error rate is the false discovery rate (FDR) which is being used more and more in practice. A specific algorithm for creating the DAG, known as the PC algorithm, has been investigated in this paper to determine the effect of controlling for the FDR instead of the traditional type I and type II errors. The results of this paper will be investigated in detail and summarized; this will include an overview of DAGs, the PC algorithm, and the results (and advantages and limitations) of adapting the PC algorithm to FDR.

Paper: Li, Junning and Z. Jane Wang. Controlling the False Discovery Rate of the Association/Causality Structure Learned with the PC Algorithm Journal of Machine Learning Research, Volume 10 (2009) pages 475-514


Project 16 : Classifier Considering Distribution Gap between Training Set and Test Set

By: Dan Xie, Xiaohui Wang


Most machine learning algorithms are constructed under the assumption that the training and the test data are drawn from the same distribution.

However, in practice we are very often faced with the situation where the distributions of the training and the test data differ.

In the article "A Kernel Method for the Two-Sample-Problem", the authors define the maximum mean discrepancy (MMD) to measure the similarity between two sets with different distributions. We decide to get a classifier based on this measurement.