STAT946F17/ Coupled GAN
Introduction
Generative models attempt to characterize and estimate the underlying probability distribution of the data (typically images) and in doing so generate samples from the aforementioned learned distribution. Moment-matching generative networks, Variational auto-encoders, and Generative Adversarial Networks (GANs) are some of the most popular (and recent) class of techniques in this burgeoning literature on generative models. The authors of the paper we are reviewing focus on proposing an extension to the class of GANs.
The novelty of the proposed Coupled GAN (CoGAN) method lies in extending the GAN procedure (described in the next section) to the multi-domain setting. That is, the CoGAN methodology attempts to learn the (underlying) joint probability distribution of multi-domain images as a natural extension from the marginal setting associated with the vanilla GAN framework. Given the dense and active literature on generative models, generating images in multiple domains in far from ground breaking. Related works revolve around multi-modal learning including multi-modal deep learning, semi-coupled dictionary learning, joint embedding space learning, cross-domain image generation to name a few \TODO{inline citations}. Thus, the novelty of the author's contributions to this field comes from two key differentiating points. Firstly, this was (one of) the first papers to endeavor to generate multi-domain images with the GAN framework. Secondly, and perhaps more significantly, the authors proposed to learn the underlying joint distribution without requiring the presence of tuples of corresponding images in the training set. Only sets of images drawn from the (marginal) distributions of the separate domains is sufficient. As per the authors' claim constructing tuples of corresponding images to train from is challenging and a potential bottle-neck for multi-domain image generation. One way around this bottleneck is thus to use their proposed CoGAN methodology. More details of how the author's achieve joint-distribution learning will be provided in the Coupled GAN section below.
Generative Adversarial Networks
A typical GAN framework consists of a generative model and a discriminative model. The generative model, which often is a de-convolutional network, takes as input a random latent vector (typically uniform or Gaussian), and synthesizes novel images resembling the real images (training set). The discriminative model, often a convolutional network, on the other hand tries to distinguish between the fake synthesized images and the real images. The idea then is to let the two component models of the GAN framework "compete" with each other in the form of a minmax two player game.
To further clarify and fix this idea, we introduce the mathematical setup of GANs following the notation used by the authors of this paper for sake of consistency. Let us define the following:
[math]\displaystyle{ \mathbf{x}- }[/math] natural image drawn from underlying distribution, [math]\displaystyle{ p_X }[/math]// [math]\displaystyle{ \mathbf{z} \sim U[-1,1]^d- }[/math] be a latent random vector,