supervised Dictionary Learning

From statwiki
Revision as of 11:06, 18 November 2010 by Mgangeh (talk | contribs)
Jump to navigation Jump to search

This paper proposes a novel discriminative formulation for sparse representation of images using learned dictionaries.

Introduction

Sparse models were originated from two different communities under two different names, one by neurologists mainly by the salient work done by Olshausen in <ref name="Olshausen1996">B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, vol. 381, pp. 607-609, 1996.</ref> as sparse coding, and second by researchers in the field of signal processing as independent component analysis (ICA) (for example see <ref name="ICABook">A. Hyvärinen, J. Karhunen, and E. Oja. Independent component analysis. New York: John Wiley and Sons, 2001.</ref> for a comprehensive overview of ICA). Although SC and ICA originated from two different problems (the former as the model of simple cells in visual cortex and the latter as the solution to decompose the independent sources of some mixed signals), they merged, eventually, into similar technique (with somewhat different description).
On the other hand, representation of a signal using a learned dictionary instead of predefined operators (such as wavelets in signal and image processing or local binary patterns (LBP) in texture classification) has led to state-of-the-art results in various applications such as denoising <ref name="Elad2006">M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. IP, vol. 54, no. 12, 2006.</ref> and texture classification <ref name="VZ2009">M. Varma and A. Zisserman. A statistical approach to material classification using image patch exemplars. IEEE Trans. PAMI, vol. 31, no. 11, pp. 2032-2047, 2009.</ref>.
It is well known that sparsity captures higher order statistics of the data. For example, in comparing PCA and ICA, while PCA can only capture up to the second order statistics of the data and hence is appropriate for Gaussian models, ICA can capture higher order statistics of the data. Whitening data is a preprocessing step in ICA and. ICA is hence appropriate for supergaussian models (such as data with Laplacian distributions) <ref name="ICABook"/>.
The previous work in the literature on sparse representation is done on either predefined (fixed) operators or learned dictionaries for reconstructive, discriminative, or generative models in various applications such as signal and face recognition <ref>K. Huang and S. Aviyente. Sparse representation for signal classification. In NIPS, 2006.</ref><ref>J.Wright, A.Y. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Trans. PAMI, vol.31, no. 2, pp. 210-227, 2009.</ref><ref>R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. Self-taught learning: transfer learning from unlabeled data. In ICML, 2007.</ref><ref>J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Learning discriminative dictionaries for local image analysis. In CVPR, 2008.</ref><ref>M. Ranzato and M. Szummer. Semi-supervised learning of compact document representations with deep networks. In ICML, 2008.</ref>.
In this paper, the authors extend these approaches by proposing a framework for learning simultaneously a single shared dictionary as well as sparse models (for all classes) in a mixed generative and discriminative formulation. Although, this joint generative/discriminative framework have been also reported in probabilistic approaches and in neural networks, but not in sparse dictionary learning.

Sparse Representation and Dictionary Learning

Sparse representation can be exploited in two different ways. First, by representing a signal as a linear combination of predefined bases such as wavelets <ref name="MallatSparseBook">S. Mallat. A wavelet tour of signal processing, the sparse way. Burlington: Academic Press, 3rd ed., 2009.</ref>. Second, by using a dictionary of primitive elements learned from the signal and decomposing the signal into these primitive elements <ref name="Julesz1981">B. Juelsz. Textons, the elements of texture-perception, and their interactions. Nature, vol. 290, pp. 91–97, 1981.</ref><ref name="Olshausen1996"/><ref name="VZ2009"/>. There are two steps in the latter approach, i.e., learning the dictionary and computation of (sparse) coefficients for representing the signal using the elements of dictionary.
Having a fixed dictionary [math]\displaystyle{ D=[d_{1},...,d_{k}]\in \mathbb{R}^{n\times k} }[/math], a signal [math]\displaystyle{ \mathbf{x}\in \mathbb{R}^{n} }[/math] can be reconstructed using sparse coefficients [math]\displaystyle{ \mathbf{\alpha} }[/math] using

[math]\displaystyle{ EQUATION 1, \;\;\;(1) }[/math]


where, [math]\displaystyle{ \ell_{1} }[/math] penalty yields a sparse solution for coefficients [math]\displaystyle{ \mathbf{\alpha} }[/math].
In (1), the dictionary is fixed and the goal is to find the sparse coefficients [math]\displaystyle{ \mathbf{\alpha} }[/math] such that [math]\displaystyle{ \mathbf{x} }[/math] can be reconstructed using the bases in the dictionary. The dictionary can be learned using m training data [math]\displaystyle{ (\mathbf{x}_{i})_{i=1}^{m} }[/math] in [math]\displaystyle{ \mathbb{R}^{n} }[/math]. Hence, (1) can be modified as follows:

[math]\displaystyle{ EQUATION 2. \;\;\;(2) }[/math]


Since the reconstruction errors [math]\displaystyle{ \left \| \mathbf{x}_{i}-D\mathbf{\alpha}_{i} \right \|_{2}^{2} }[/math] in (2) are invariant to scaling dictionary [math]\displaystyle{ D }[/math] by a scalar and coefficients [math]\displaystyle{ \mathbf{\alpha}_{i} }[/math] by its inverse, the [math]\displaystyle{ \ell_{2} }[/math] norm of the columns of [math]\displaystyle{ D }[/math] should be constrained <ref name="Elad2006"/>. This reconstructive framework is called REC in this paper.

References

<references />