STAT946F17/ Learning a Probabilistic Latent Space of Object Shapes via 3D GAN: Difference between revisions

From statwiki
Jump to navigation Jump to search
(Created page with "= Introduction = = Model = === Theory of Recurrent Neural Networks === === RNN Architecture by Graves, 2013 === === Long Short-Term Memory Recurrent Neural Network (LSTM...")
 
No edit summary
Line 1: Line 1:
= Introduction =
= Introduction =


= Related Work =
Existing method
* Borrow parts from objects in existing CAD model libraries → realistic but not novel
* Learn deep object representations based on voxelized objects → fail to capture highly structured differences between 3D objects
* Mostly learn based on a supervised criterion


= Methodology =
Let us first review GANs...


= Model =
=== 3D-GANs ===
3D-GANs are a simple extension of GANs for 2D imagery. Here, the model is composed of a
* Generator (G): maps a 200-dimensional latent vector z, randomly sampled from a probabilistic latent space (U[0,1]), to a 64 x 64 x 64 cube, representing the object G(z) in voxel space.
* Discriminator (D): outputs a confidence value D(x ) of whether a 3D object input x  input is real or synthetic
and a loss function                        L3D-GAN = log D(x ) + log( 1 - D( G(z ) ) )


=== Theory of Recurrent Neural Networks ===
=== 3D-VAE-GANs ===


=== RNN Architecture by Graves, 2013 ===
An extension of ...


=== Long Short-Term Memory Recurrent Neural Network (LSTM) ===
= Training and Results =


=== Input and Output Data Transformation ===
=== Network Architecture ===


===== Generator =====
===== Discriminator =====
===== Encoder =====


=== Coupled Generator-Discriminator Training ===
Training GANs is tricky because in practice training a network to generate objects is more difficult than training a network to distinguish between real and fake samples. In other words, training the generator is harder than training the discriminator. Intuitively, it becomes difficult for the generator to extract signal for improvement from a discriminator that is way ahead, as all examples it generated would be correctly identified as synthetic with high confidence. This problem is compounded when we deal with 3D generated objects (compared to 2D) due to the higher dimensionality. There exists different strategies to overcome this challenge, some of which we saw in class:


* 1 D update every N G updates
* Capped gradient updates, where only a maximum gradient is propagated back through the network for the discriminator network, essentially capping how fast it can learn


 
The approach used in this paper is interesting in that it adaptively decides whether to train the network or not. Here, for each batch, D is only updated if its accuracy in the last batch is <= 80%. Additionally, the generator learning rate is set to 2.5 x 10e-3 whereas the discriminator learning rate is set to 10e-5. This further caps the speed of training for the discriminator relative to the generator.
= Training and Results =


=== Training Method ===
=== Training Method ===

Revision as of 14:01, 17 October 2017

Introduction

Related Work

Existing method

  • Borrow parts from objects in existing CAD model libraries → realistic but not novel
  • Learn deep object representations based on voxelized objects → fail to capture highly structured differences between 3D objects
  • Mostly learn based on a supervised criterion

Methodology

Let us first review GANs...

3D-GANs

3D-GANs are a simple extension of GANs for 2D imagery. Here, the model is composed of a

  • Generator (G): maps a 200-dimensional latent vector z, randomly sampled from a probabilistic latent space (U[0,1]), to a 64 x 64 x 64 cube, representing the object G(z) in voxel space.
  • Discriminator (D): outputs a confidence value D(x ) of whether a 3D object input x input is real or synthetic

and a loss function L3D-GAN = log D(x ) + log( 1 - D( G(z ) ) )

3D-VAE-GANs

An extension of ...

Training and Results

Network Architecture

Generator
Discriminator
Encoder

Coupled Generator-Discriminator Training

Training GANs is tricky because in practice training a network to generate objects is more difficult than training a network to distinguish between real and fake samples. In other words, training the generator is harder than training the discriminator. Intuitively, it becomes difficult for the generator to extract signal for improvement from a discriminator that is way ahead, as all examples it generated would be correctly identified as synthetic with high confidence. This problem is compounded when we deal with 3D generated objects (compared to 2D) due to the higher dimensionality. There exists different strategies to overcome this challenge, some of which we saw in class:

  • 1 D update every N G updates
  • Capped gradient updates, where only a maximum gradient is propagated back through the network for the discriminator network, essentially capping how fast it can learn

The approach used in this paper is interesting in that it adaptively decides whether to train the network or not. Here, for each batch, D is only updated if its accuracy in the last batch is <= 80%. Additionally, the generator learning rate is set to 2.5 x 10e-3 whereas the discriminator learning rate is set to 10e-5. This further caps the speed of training for the discriminator relative to the generator.

Training Method

Scoring Method

Results

Some developments of LSTM

Open questions

Source

Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 27 3104–3112 (2014). <references />