learning Fast Approximations of Sparse Coding: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "= Introduction =")
 
Line 1: Line 1:
= Introduction =
= Overview =
 
In contrast to a dimensionality-reduction approach to feature-extraction, sparse coding is an unsupervised method which attempts to construct a novel representation of the data by mapping it linearly into a higher-dimensional space. This transformation is performed in the hope of being able to obtain a new feature space in which, for each vector, we can find a smaller subset of features that largely reconstruct it. Essentially, this allows us to perform case-specific feature-extraction, as for a given input, we seek a smaller subset of features to which we will assign the majority of the weight for its representation, with the remainder being negligibly small. This provides us with a procedure which attempts to flexibly represent unseen instances of the input space.
The introduction of a larger set of spanning vectors is a consequence of the objective of producing accurate reconstructions across a broad range of possible input from the original space. However, the algebra of linear transformations tells us that input vectors will no longer have a unique representation in the higher-dimensional feature space. This short-coming is alleviated by the fact that we would like to assign the majority of influence to only a subset of the new features. We implement this goal using the notion of sparsity; namely, we will penalize large weight values.
 
Unfortunately, there are some implementation issues which prevent the use of sparse coding in certain contexts. The fact that we must find a representation for each new case the system is provided with often renders the procedure infeasible for online processing tasks, as new data must be handled in real-time. Several approximation algorithms have been proposed to address issues in processing speed. However, these methods suffer from deficiencies in their ability to take into account some relevant conditional-independence structures in the data. To resolve these limitations, the authors introduce a feed-forward architecture which adapts some of these approximation schemes, giving a new procedure which is ~10 times more efficient than the previous state-of-the-art approximation.
 
= Review of Sparse Coding =

Revision as of 00:25, 21 November 2015

Overview

In contrast to a dimensionality-reduction approach to feature-extraction, sparse coding is an unsupervised method which attempts to construct a novel representation of the data by mapping it linearly into a higher-dimensional space. This transformation is performed in the hope of being able to obtain a new feature space in which, for each vector, we can find a smaller subset of features that largely reconstruct it. Essentially, this allows us to perform case-specific feature-extraction, as for a given input, we seek a smaller subset of features to which we will assign the majority of the weight for its representation, with the remainder being negligibly small. This provides us with a procedure which attempts to flexibly represent unseen instances of the input space.

The introduction of a larger set of spanning vectors is a consequence of the objective of producing accurate reconstructions across a broad range of possible input from the original space. However, the algebra of linear transformations tells us that input vectors will no longer have a unique representation in the higher-dimensional feature space. This short-coming is alleviated by the fact that we would like to assign the majority of influence to only a subset of the new features. We implement this goal using the notion of sparsity; namely, we will penalize large weight values.

Unfortunately, there are some implementation issues which prevent the use of sparse coding in certain contexts. The fact that we must find a representation for each new case the system is provided with often renders the procedure infeasible for online processing tasks, as new data must be handled in real-time. Several approximation algorithms have been proposed to address issues in processing speed. However, these methods suffer from deficiencies in their ability to take into account some relevant conditional-independence structures in the data. To resolve these limitations, the authors introduce a feed-forward architecture which adapts some of these approximation schemes, giving a new procedure which is ~10 times more efficient than the previous state-of-the-art approximation.

Review of Sparse Coding