stat441w18/summary 1

From statwiki
Revision as of 10:57, 7 March 2018 by F8lee (talk | contribs) (→‎Methods)
Jump to navigation Jump to search

Random features for large scale kernel machines

Group members

Faith Lee

Jacov Lisulov

Shiwei Gong

Introduction and problem motivation

In classification problems, kernel methods are used for pattern analysis and require only user-specified kernel. Kernel methods can be thought of as instance-based methods where they learn the i-th training examples are "remembered" to learn for corresponding weights. Prediction on untrained examples are then treated with a similarity function, k (also called a kernel between this untrained example and each of the training inputs. A similarity function measures similarity between two objects. By conventional notation, we have that

[math]\displaystyle{ f(x; \alpha) = \sum_{i = 1}^{N} \alpha_i k(x, x_i) }[/math] where k is a kernel function, [math]\displaystyle{ k(x, x') \approx \sum_{j = i}^{D} Z(X; W_j)Z(X'; W_j) }[/math]

and [math]\displaystyle{ \alpha }[/math] are the corresponding weights.

An example of a kernel method is the support vector machine. Kernel methods provides a means of approximating a non-linear function or decision boundary. However, the problem of using kernel methods are that:

1) It scales poorly with the size of the dataset

2) Requires computation of matrices (and inverses)

3) Solving linear system of equations

In this paper, the authors propose mapping of input data to a randomized low-dimensional feature space and then apply existing linear methods. The main goal is to reduce the bottleneck of kernel-based inference methods.

Methods

The Fourier transform of a function is given by definition as . [math]\displaystyle{ \int_{-\infty}^{\infty} exp^{-2\pi i x \epsilon} dx }[/math].

For any shift-invariant kernel k(x-y), we can build a low-dimension randomized function z so that [math]\displaystyle{ \forall x, y }[/math] [math]\displaystyle{ k(x,y) = \lt \phi(x), \phi(y) \gt \approx Z(x)^TZ(y) }[/math].

The last term approximated represents the dot product of the featurized inputs.

Results

Conclusion