# Introduction

Early successes in the field of representation learning were based on supervised approaches, which used large labelled datasets to achieve impressive results. On the other hand, popular unsupervised generative modeling methods mainly consisted of probabilistic approaches focusing on low dimensional data. In recent years, there have been models proposed which try to combine these two approaches. One such popular method is called variational auto-encoders (VAEs). VAEs are theoretically elegant but have a major drawback of generating blurry sample images when used for modeling natural images. In comparison, generative adversarial networks (GANs) produce much sharper sample images but have their own list of problems which include lack of encoder, harder to train, and "mode collapse" problem. Mode collpase problem refers to the inability of the model to capture all the variability in the true data distribution. Currently, there has been a lot of activity around finding and evaluating numerous GANs architectures and combining VAEs and GANs but a model which combines the best of both GANs and VAEs is yet to be discovered.

The work done in this paper builds up on the theoretical work done in 4. The authors tackle generative modeling using optimal transport (OT). The OT cost is defined as measure of distance between probability distributions. One of the feature of OT cost which is beneficial is that it provides much weaker topology when compared to other costs including f-divergences which are associated with the original GAN algorithms. The problem with stronger notions of distances such f-divergences is that they often max out and provide no useful gradients for training. In comparison, the OT cost has been claimed to behave much more nicely [5, 8]. Despite the preceding claim, the implementation, which is similar to GANs, still requires addition of a constraint or a regularization term into the the objective function.

## Contributions

Let $P_X$ be the true but unknown data distribution, $P_G$ be the latent variable model specified by the prior distribution $P_Z$ of latent codes $Z \in \mathcal{Z}$ and the generative model $P_G(X|Z)$ of the data points $X \in \mathcal{X}$ given $Z$.The goal in this paper is to minimize $OT\ W_c(P_X, P_G)$.

The main contributions are given below: