STAT946F17/ Learning a Probabilistic Latent Space of Object Shapes via 3D GAN: Difference between revisions
(Created page with "= Introduction = = Model = === Theory of Recurrent Neural Networks === === RNN Architecture by Graves, 2013 === === Long Short-Term Memory Recurrent Neural Network (LSTM...") |
No edit summary |
||
Line 1: | Line 1: | ||
= Introduction = | = Introduction = | ||
= Related Work = | |||
Existing method | |||
* Borrow parts from objects in existing CAD model libraries → realistic but not novel | |||
* Learn deep object representations based on voxelized objects → fail to capture highly structured differences between 3D objects | |||
* Mostly learn based on a supervised criterion | |||
= Methodology = | |||
Let us first review GANs... | |||
= | === 3D-GANs === | ||
3D-GANs are a simple extension of GANs for 2D imagery. Here, the model is composed of a | |||
* Generator (G): maps a 200-dimensional latent vector z, randomly sampled from a probabilistic latent space (U[0,1]), to a 64 x 64 x 64 cube, representing the object G(z) in voxel space. | |||
* Discriminator (D): outputs a confidence value D(x ) of whether a 3D object input x input is real or synthetic | |||
and a loss function L3D-GAN = log D(x ) + log( 1 - D( G(z ) ) ) | |||
=== | === 3D-VAE-GANs === | ||
An extension of ... | |||
= | = Training and Results = | ||
=== | === Network Architecture === | ||
===== Generator ===== | |||
===== Discriminator ===== | |||
===== Encoder ===== | |||
=== Coupled Generator-Discriminator Training === | |||
Training GANs is tricky because in practice training a network to generate objects is more difficult than training a network to distinguish between real and fake samples. In other words, training the generator is harder than training the discriminator. Intuitively, it becomes difficult for the generator to extract signal for improvement from a discriminator that is way ahead, as all examples it generated would be correctly identified as synthetic with high confidence. This problem is compounded when we deal with 3D generated objects (compared to 2D) due to the higher dimensionality. There exists different strategies to overcome this challenge, some of which we saw in class: | |||
* 1 D update every N G updates | |||
* Capped gradient updates, where only a maximum gradient is propagated back through the network for the discriminator network, essentially capping how fast it can learn | |||
The approach used in this paper is interesting in that it adaptively decides whether to train the network or not. Here, for each batch, D is only updated if its accuracy in the last batch is <= 80%. Additionally, the generator learning rate is set to 2.5 x 10e-3 whereas the discriminator learning rate is set to 10e-5. This further caps the speed of training for the discriminator relative to the generator. | |||
=== Training Method === | === Training Method === |
Revision as of 14:01, 17 October 2017
Introduction
Related Work
Existing method
- Borrow parts from objects in existing CAD model libraries → realistic but not novel
- Learn deep object representations based on voxelized objects → fail to capture highly structured differences between 3D objects
- Mostly learn based on a supervised criterion
Methodology
Let us first review GANs...
3D-GANs
3D-GANs are a simple extension of GANs for 2D imagery. Here, the model is composed of a
- Generator (G): maps a 200-dimensional latent vector z, randomly sampled from a probabilistic latent space (U[0,1]), to a 64 x 64 x 64 cube, representing the object G(z) in voxel space.
- Discriminator (D): outputs a confidence value D(x ) of whether a 3D object input x input is real or synthetic
and a loss function L3D-GAN = log D(x ) + log( 1 - D( G(z ) ) )
3D-VAE-GANs
An extension of ...
Training and Results
Network Architecture
Generator
Discriminator
Encoder
Coupled Generator-Discriminator Training
Training GANs is tricky because in practice training a network to generate objects is more difficult than training a network to distinguish between real and fake samples. In other words, training the generator is harder than training the discriminator. Intuitively, it becomes difficult for the generator to extract signal for improvement from a discriminator that is way ahead, as all examples it generated would be correctly identified as synthetic with high confidence. This problem is compounded when we deal with 3D generated objects (compared to 2D) due to the higher dimensionality. There exists different strategies to overcome this challenge, some of which we saw in class:
- 1 D update every N G updates
- Capped gradient updates, where only a maximum gradient is propagated back through the network for the discriminator network, essentially capping how fast it can learn
The approach used in this paper is interesting in that it adaptively decides whether to train the network or not. Here, for each batch, D is only updated if its accuracy in the last batch is <= 80%. Additionally, the generator learning rate is set to 2.5 x 10e-3 whereas the discriminator learning rate is set to 10e-5. This further caps the speed of training for the discriminator relative to the generator.
Training Method
Scoring Method
Results
Some developments of LSTM
Open questions
Source
Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 27 3104–3112 (2014). <references />