Difference between revisions of "conditional neural process"

Introduction

To train a model effectively, deep neural networks require large datasets. To mitigate this data efficiency problem, learning in two phases is one approach : the first phase learns the statistics of a generic domain without committing to a specific learning task; the second phase learns a function for a specific task, but does so using only a small number of data points by exploiting the domain-wide statistics already learned.

For example, consider a data set $\{x_i, y_i\}$ with evaluations $y_i = f(x_i)$ for some unknown function $f$. Assume $g$ is an approximating function of f. The aim is yo minimize the loss between $f$ and $g$ on the entire space $X$. In practice, the routine is evaluated on a finite set of observations.

In this work, they proposed a family of models that represent solutions to the supervised problem, and ab end-to-end training approach to learning them, that combine neural networks with features reminiscent if Gaussian Process. They call this family of models Conditional Neural Processes.

Model

Let training set be $O = \{x_i, y_i\}_{i = 0} ^ n-1$, and test set be $T = \{x_i, y_i\}_{i = n} ^ {n + m - 1}$.

P be a probability distribution over functions $F : X \to Y$, formally known as a stochastic process. Thus, P defines a joint distribution over the random variables ${f(x_i)}_{i = 0} ^{n + m - 1}$. Therefore, for $P(f(x)|O, T)$, our task is to predict the output values $f(x_i)$ for $x_i \in T$, given $O$,

Conditional Neural Process

Conditional Neural Process models directly parametrize conditional stochastic processes without imposing consistency with respect to some prior process. CNP parametrize distributions over