Difference between revisions of "conditional neural process"

From statwiki
Jump to: navigation, search
(Model)
(Model)
Line 12: Line 12:
 
Let training set be <math display="inline"> O =  \{x_i, y_i\}_{i = 0} ^ n-1</math>, and test set be  <math display="inline"> T =  \{x_i, y_i\}_{i = n} ^ {n + m - 1}</math>.
 
Let training set be <math display="inline"> O =  \{x_i, y_i\}_{i = 0} ^ n-1</math>, and test set be  <math display="inline"> T =  \{x_i, y_i\}_{i = n} ^ {n + m - 1}</math>.
  
We assume the outputs are obtained by the following steps :
+
P be a probability distribution over functions  <math display="inline"> F : X \to Y</math>, formally known as a stochastic process. Thus, P defines a joint distribution over the random variables  <math display="inline"> {f(x_i)}_{i = 0} ^{n + m - 1}</math>. Therefore, for  <math display="inline"> P(f(x)|O, T)</math>, our task is to predict the output values  <math display="inline">f(x_i)</math> for  <math display="inline"> x_i \in T</math>, given  <math display="inline"> O</math>,
 
 
P be a probability distribution over functions  <math display="inline"> F : X \to Y</math>
 

Revision as of 22:46, 18 November 2018

Introduction

To train a model effectively, deep neural networks require large datasets. To mitigate this data efficiency problem, learning in two phases is one approach : the first phase learns the statistics of a generic domain without committing to a specific learning task; the second phase learns a function for a specific task, but does so using only a small number of data points by exploiting the domain-wide statistics already learned.

For example, consider a data set [math] \{x_i, y_i\} [/math] with evaluations [math]y_i = f(x_i) [/math] for some unknown function [math]f[/math]. Assume [math]g[/math] is an approximating function of f. The aim is yo minimize the loss between [math]f[/math] and [math]g[/math] on the entire space [math]X[/math]. In practice, the routine is evaluated on a finite set of observations.

In this work, they proposed a family of models that represent solutions to the supervised problem, and ab end-to-end training approach to learning them, that combine neural networks with features reminiscent if Gaussian Process. They call this family of models Conditional Neural Processes.


Model

Let training set be [math] O = \{x_i, y_i\}_{i = 0} ^ n-1[/math], and test set be [math] T = \{x_i, y_i\}_{i = n} ^ {n + m - 1}[/math].

P be a probability distribution over functions [math] F : X \to Y[/math], formally known as a stochastic process. Thus, P defines a joint distribution over the random variables [math] {f(x_i)}_{i = 0} ^{n + m - 1}[/math]. Therefore, for [math] P(f(x)|O, T)[/math], our task is to predict the output values [math]f(x_i)[/math] for [math] x_i \in T[/math], given [math] O[/math],