deep Generative Stochastic Networks Trainable by Backprop: Difference between revisions
Dylandrover (talk | contribs) No edit summary |
Dylandrover (talk | contribs) No edit summary |
||
Line 3: | Line 3: | ||
The Deep Learning boom that has been seen in recent years was spurred initially by research in unsupervised learning techniques. However, most of the major successes over the last few years have mostly been based on supervised techniques. A drawback for the unsupervised methods stems from their need for too many calculations and intractable sums in their models (inference, learning, sampling and partition functions). The paper presented puts forth an idea for a network that creates a model of a conditional distribution, <math>P(X|\bar{X})</math>, which can be seen as a local (usually) unimodal representation of <math>P(X)</math>. <math>\bar{X}</math> is a corrupted example of the original data <math>X</math>. The Generative Stochastic Network (GSN) combines arbitrary latent variables <math>H</math> that serve as input for a Markov chain which build in layers that eventually create a representation of the original data. Training of the network does not need Gibb's sampling or large partition functions but is trained with backpropagation and all the tools that come with it. | The Deep Learning boom that has been seen in recent years was spurred initially by research in unsupervised learning techniques. However, most of the major successes over the last few years have mostly been based on supervised techniques. A drawback for the unsupervised methods stems from their need for too many calculations and intractable sums in their models (inference, learning, sampling and partition functions). The paper presented puts forth an idea for a network that creates a model of a conditional distribution, <math>P(X|\bar{X})</math>, which can be seen as a local (usually) unimodal representation of <math>P(X)</math>. <math>\bar{X}</math> is a corrupted example of the original data <math>X</math>. The Generative Stochastic Network (GSN) combines arbitrary latent variables <math>H</math> that serve as input for a Markov chain which build in layers that eventually create a representation of the original data. Training of the network does not need Gibb's sampling or large partition functions but is trained with backpropagation and all the tools that come with it. | ||
Unsupervised learning is attractive because the quantity of unlabelled data far exceeds that of labelled data | Unsupervised learning is attractive because the quantity of unlabelled data far exceeds that of labelled data | ||
Line 23: | Line 22: | ||
= Experimental Results = | = Experimental Results = | ||
Some initial experimental results were created without extensive parameter alteration. This was done to maintain consistency over the tests and likely to show that even without optimization that the results approached the performance of more established unsupervised learning networks. | Some initial experimental results were created without extensive parameter alteration. This was done to maintain consistency over the tests and likely to show that even without optimization that the results approached the performance of more established unsupervised learning networks. The main comparison was made to Deep Boltzmann Machines. | ||
== MNIST == | == MNIST == | ||
<math>(h_i = \eta_{out} + \tanh(\eta_{in} + a_i))</math> | |||
[[File:figure_3_bengio.png |thumb|upright=2| Figure 3]] | [[File:figure_3_bengio.png |thumb|upright=2| Figure 3]] |
Revision as of 19:41, 18 November 2015
Introduction
The Deep Learning boom that has been seen in recent years was spurred initially by research in unsupervised learning techniques. However, most of the major successes over the last few years have mostly been based on supervised techniques. A drawback for the unsupervised methods stems from their need for too many calculations and intractable sums in their models (inference, learning, sampling and partition functions). The paper presented puts forth an idea for a network that creates a model of a conditional distribution, [math]\displaystyle{ P(X|\bar{X}) }[/math], which can be seen as a local (usually) unimodal representation of [math]\displaystyle{ P(X) }[/math]. [math]\displaystyle{ \bar{X} }[/math] is a corrupted example of the original data [math]\displaystyle{ X }[/math]. The Generative Stochastic Network (GSN) combines arbitrary latent variables [math]\displaystyle{ H }[/math] that serve as input for a Markov chain which build in layers that eventually create a representation of the original data. Training of the network does not need Gibb's sampling or large partition functions but is trained with backpropagation and all the tools that come with it.
Unsupervised learning is attractive because the quantity of unlabelled data far exceeds that of labelled data
Avoiding intractable sums or maximization that is inherent in many unsupervised techniques
Generalize autoencoders
GSN parametrize transition operators of Markov chain rather than P(X). Allows for training of unsupervised methods by gradient descent and ML no partition functions, just backprop
graphical models have too many computations (inference, sampling, learning) MCMC can be used for estimation if only a few terms dominate the weighted sum that is being calculated.
Generative Stochastic Network (GSN)
GSN relies on estimating the transition operator of a Markov chain.
Experimental Results
Some initial experimental results were created without extensive parameter alteration. This was done to maintain consistency over the tests and likely to show that even without optimization that the results approached the performance of more established unsupervised learning networks. The main comparison was made to Deep Boltzmann Machines.
MNIST
[math]\displaystyle{ (h_i = \eta_{out} + \tanh(\eta_{in} + a_i)) }[/math]
This is sentences that appear next to the image
Faces
Comparison
Critique
Mentions SPN