inductive Kernel Low-rank Decomposition with Priors: A Generalized Nystrom Method: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 2: Line 2:
Low-rankness is an important structure widely exploited in machine learning. Low-rank matrix decomposition produces a compact representation of large matrices, which is the key to scaling up a great variety of kernel learning algorithms. However there are still some concerns with existing approaches. First, most of them are intrinsically unsupervised and only focus on numerical approximation of given matrices i.e. cannot incorporate prior knowledge. Second, many decomposition methods, the factorization can only be computed for samples available in the training stage, it difficult to generalize the decomposition to new samples.
Low-rankness is an important structure widely exploited in machine learning. Low-rank matrix decomposition produces a compact representation of large matrices, which is the key to scaling up a great variety of kernel learning algorithms. However there are still some concerns with existing approaches. First, most of them are intrinsically unsupervised and only focus on numerical approximation of given matrices i.e. cannot incorporate prior knowledge. Second, many decomposition methods, the factorization can only be computed for samples available in the training stage, it difficult to generalize the decomposition to new samples.


This paper introduces a low-rank decomposition algorithm by generalizing the [ http://en.wikipedia.org/wiki/Nyström_method Nystrom method ] that incorporates side information. The novelty is to provide an interpretation of the matrix completion view of Nystrom method as a bilateral extrapolation of a dictionary kernel, and generalize it to incorporate prior information  in computing improved low-rank decompositions. The author claims the two advantages of the method are its generative structure and linear complexity in sample size.
This paper introduces a low-rank decomposition algorithm by generalizing the [ http://en.wikipedia.org/wiki/Nyström_method Nystrom method] that incorporates side information. The novelty is to provide an interpretation of the matrix completion view of Nystrom method as a bilateral extrapolation of a dictionary kernel, and generalize it to incorporate prior information  in computing improved low-rank decompositions. The author claims the two advantages of the method are its generative structure and linear complexity in sample size.


Nystrom method was originated from solving integral equations and was introduced to machine learning community by <ref name = "Williams & Seeger 2001">
Nystrom method was originated from solving integral equations and was introduced to machine learning community by <ref name = "Williams & Seeger 2001">

Revision as of 19:40, 30 June 2013

Introduction

Low-rankness is an important structure widely exploited in machine learning. Low-rank matrix decomposition produces a compact representation of large matrices, which is the key to scaling up a great variety of kernel learning algorithms. However there are still some concerns with existing approaches. First, most of them are intrinsically unsupervised and only focus on numerical approximation of given matrices i.e. cannot incorporate prior knowledge. Second, many decomposition methods, the factorization can only be computed for samples available in the training stage, it difficult to generalize the decomposition to new samples.

This paper introduces a low-rank decomposition algorithm by generalizing the [ http://en.wikipedia.org/wiki/Nyström_method Nystrom method] that incorporates side information. The novelty is to provide an interpretation of the matrix completion view of Nystrom method as a bilateral extrapolation of a dictionary kernel, and generalize it to incorporate prior information in computing improved low-rank decompositions. The author claims the two advantages of the method are its generative structure and linear complexity in sample size.

Nystrom method was originated from solving integral equations and was introduced to machine learning community by <ref name = "Williams & Seeger 2001"> Williams, C. and Seeger, M. Using the Nystrom method to speed up kernel machine. Advances in Neural Information Processing System 13, 2001. </ref> <ref name = "Fowlkes etal 2004"> Folkes, C., Belongie, S. Chung, F., and Malik, J. Spectral grouping using Nystrom Method. IEEE Transactions on Pattern Analysis and Machine Intellgence, 26(2): 214- 225, 2004. </ref>. Given a kernel function [math]\displaystyle{ k(.,.) }[/math] and a sample set with underlying distribution [math]\displaystyle{ p(.) }[/math], the Nystrom method aims at solving the following integral equation [math]\displaystyle{ \int k(x,y)p(y)\phi_i(y)dy = \lambda_i\phi_i(x) }[/math].