semi-supervised Learning with Deep Generative Models
Introduction
Large labelled data sets have led to massive improvements in the performance of machine learning algorithms, especially supervised neural networks. However, the world in general is not labelled and there exist a far greater amount of unlabelled data than labelled data. A common situation is to have a comparatively small quantity of labelled data paired with a larger amount of unlabelled data. This leads to the idea of a semi-supervised learning model where the unlabelled data is used to prime the model for relevant features and the labels are then learned for classification. A prominent example of this type of model is the restricted Boltzmann machine based Deep Belief Network (DBN). Where layers of RBM are trained to learn unsupervised features of the data and then a final classification layer is applied such that labels can be assigned. Unsupervised learning creates what is known as a generative model which creates a joint distribution [math]\displaystyle{ P(x, y) }[/math] (which can be sampled from). This is contrasted by the supervised discriminative model, which create conditional distributions [math]\displaystyle{ P(y | x) }[/math]. The paper combines these two methods to achieve high performance on benchmark tasks and uses deep neural networks in an innovative manner to create a layered semi-supervised classification/generation model.
Current Models and Limitations
The paper claims that existing unlabelled data models do not scale well for very large sets of unlabelled data. One example that they discuss is the Transductive SVM, which they claim does not scale well and that optimization for them is a problem. Graph based models suffer from sensitivity to their graphical structure which may make them rigid. Finally they briefly discuss other neural network based methods such as the Manifold Tangent Classifier that uses contrastive auto-encoders (CAEs) to deduce the manifold on which data lies. Based on the manifold hypothesis this means that similar data should not lie far from the manifold and they then can use something called TangentProp to train a classifier based on the manifold of the data.
Proposed Method
Rather than use the methods mentioned above the team suggests that using generative models based on neural networks would be beneficial. Current generative models lack string inference and scalability though. The paper proposes a method that uses variational inference for semi-supervised classification that will employ deep neural networks.
Latent Feature Discriminative Model (M1)
[math]\displaystyle{ p(\mathbf{z}) = \mathcal{N}(\mathbf{z}|\mathbf{0,I}) }[/math]
[math]\displaystyle{ p(\mathbf{x|z}) = f(\mathbf{x};\mathbf{z,\theta}) }[/math]
Generative Semi-Supervised Model (M2)
[math]\displaystyle{ p(y) = Cat(y|\mathbf{\pi}) }[/math]
[math]\displaystyle{ p(\mathbf{z}) = \mathcal{N}(\mathbf{z}|\mathbf{0,I}) }[/math]
[math]\displaystyle{ p_{\theta}(\mathbf{x}|y, \mathbf{z}) = f(\mathbf{x};y,\mathbf{z,\theta}) }[/math]
Stacked Generative Semi-Supervised Model (M1+M2)
[math]\displaystyle{ p_{\theta}(\mathbf{x}, y, \mathbf{z1, z2}) = p(y)p(\mathbf{z2})p_{\theta}(\mathbf{z1}|y, \mathbf{z2})p_{\theta}(\mathbf{x|z1}) }[/math]
[math]\displaystyle{ \mathbf{z2})p_{theta}(\mathbf{x|z1}) }[/math] are parameterized as deep neural networks.