Generative adversarial networks

1
Introduction
Generative
Adversarial Networks
Yunjey Choi
Korea University
DAVIAN LAB

2
Speaker Introduction Introduction
B.S. in Computer Science & Engineering at Korea University
M.S. Student in Computer Science & Engineering at Korea University (Current)
Interest: Deep Learning,TensorFlow, PyTorch
GitHub Link: https://github.com/yunjey

3
Referenced Slides Introduction
• Namju Kim. Generative Adversarial Networks (GAN)
https://www.slideshare.net/ssuser77ee21/generative-adversarial-networks-70896091
• Taehoon Kim. 지적 대화를 위한 깊고 넓은 딥러닝
https://www.slideshare.net/carpedm20/ss-63116251

5
Branches of ML Introduction
Semi-supervised
Learning Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Machine Learning
No labeled data
No feedback
“find hidden structure”
Labeled data
Direct feedback
No labeled data
Delayed feedback
Reward signal

6
Supervised Learning Introduction
Discriminative
Model woman
The discriminative model learns how to classify input to its class.
man
(1)
(1)
(2)
(2)
Input image
(64x64x3)

7
Unsupervised Learning Introduction
Generative
Model
Latent
code
The generative model learns the distribution of training data.
(100)
Image
(64x64x3)

8
Generative Model
𝑋 1 2 3 4 5 6
𝑃 𝑋
1
6
1
6
1
6
0
6
1
6
2
6
1 2 3 4 5 6
1
6
2
6
𝑝 𝑥
𝑥
Probability Distribution Introduction
Random variable
Probability Basics (Review)
Probability mass function

9
Generative Model
What if 𝑥 is actual images in the training data?
At this point, 𝑥 can be represented as a (for example) 64x64x3 dimensional vector.

10
Generative Model
𝑥
𝑝 𝑑𝑎𝑡𝑎(𝑥)
There is a 𝑝 𝑑𝑎𝑡𝑎(𝑥) that represents the distribution of actual images.
Probability density function

11
Generative Model
𝑥
Let’s take an example with human face image dataset.
Our dataset may contain few images of men with glasses.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥1 is a 64x64x3 high dimensional vector
representing a man with glasses.
The probability
density value is low

The probability
density value is high
12
Generative Model
𝑥
Our dataset may contain many images of women with black hair.
𝑥1 𝑥2 𝑥3 𝑥4
representing a woman with black hair.

13
Generative Model
𝑥
Our dataset may contain very many images of women with blonde hair.
𝑥1 𝑥2 𝑥3 𝑥4
representing a woman with blonde hair.
The probability
density value is
very high

14
Generative Model
𝑥
Our dataset may not contain these strange images.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥4 is an 64x64x3 high dimensional vector
representing very strange images.
The probability
density value is
almost 0

15
Generative Model
𝑥
The goal of the generative model is to find a 𝑝 𝑚𝑜𝑑𝑒𝑙(𝑥) that
approximates 𝑝 𝑑𝑎𝑡𝑎(𝑥) well.
𝑥1 𝑥2 𝑥3 𝑥4
𝑝 𝑚𝑜𝑑𝑒𝑙(𝑥)
𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑖𝑚𝑎𝑔𝑒𝑠
𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙

Generative Adversarial
Networks
02

17
Intuition in GAN GANs
G(z)
DGz D(G(z))
D D(x)
x
Fake image
Real image
The probability of that
x came from the real data
(0~1)Discriminator
Generator
Latent Code
May be high
May be low
Training with real images
Training with fake images

18
G(z)
DGz D(G(z))
D D(x)
x
Real image
(64x64x3)
This value should
be close to 1.Discriminator
(Neural Network)
The discriminator should classify
a real image as real.

19
G(z)
Dz D(G(z))
D D(x)
x
Fake image generated by the generator
(64x64x3)
Generator
This value should
be close to 0.
The discriminator should classify
a fake image as fake.

20
G(z)
DGz D(G(z))
D D(x)
x
Generated image
(64x64x3)
Generator
(Neural Network)
Latent Code
(100)
This value should
be close to 1.
The generator should create an image
that is indistinguishable from real to
deceive the discriminator

𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺(𝑧)))
21
Objective Function of GAN GANs
G(z)
DGz D(G(z))
D D(x)
x
𝐺 𝐷
Maximum when 𝐷(𝑥) = 1 Maximum when 𝐷(𝐺(𝑧)) = 0
Sample 𝑥 from real data distribution Sample latent code 𝑧 from Gaussian distribution
Train D to classify fake images as fake
Train D to classify real images as real
𝐷 should maximize 𝑉(𝐷, 𝐺)
Objective function

𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺(𝑧)))
𝐺 is independent of this part
22
Objective Function of GAN GANs
𝐺 should minimize 𝑉(𝐷, 𝐺) Minimum when 𝐷(𝐺(𝑧)) = 1
G(z)
DGz D(G(z))
D D(x)
x
Train G to deceive D
𝐺 𝐷
Objective function

23
PyTorch Implementation GANs
G(z) DGz D(G(z))
D D(x)x

Define the discriminator
input size: 784
hidden size: 128
output size: 1
24
G(z) DGz D(G(z))
D D(x)x
Discriminator
Assume x is MNIST (784 dimension)
Output probability(1 dimension)

25
G(z) DGz D(G(z))
D D(x)x
Generator
Define the generator
input size: 100
hidden size: 128
output size: 784
Latent code (100 dimension) Generated image (784 dimension)

26
G(z) DGz D(G(z))
D D(x)x
Binary Cross Entropy Loss (ℎ(𝑥), 𝑦)
− 𝑦 log ℎ 𝑥 − (1 − 𝑦) log(1 − ℎ(𝑥))

27
G(z) DGz D(G(z))
D D(x)x
Optimizer for D and G

28
G(z) DGz D(G(z))
D D(x)x
x is a tensor of shape (batch_size, 784).
z is a tensor of shape (batch_size, 100).

29
G(z) DGz D(G(z))
D D(x)x
D(x) gets closer to 1.
D(G(z)) gets closer to 0
Forward, Backward and Gradient Descent
Train the discriminator
with real images
Train the discriminator
with fake images

30
G(z) DGz D(G(z))
D D(x)x
Forward, Backward and Gradient Descent
Train the generator
to deceive the discriminator

31
G(z) DGz D(G(z))
D D(x)x
The complete code can be found here
https://github.com/yunjey/pytorch-tutorial

32
Objective Function of Generator
𝐺
Objective function of G
Non-Saturating Game GANs
Images created by the generator
at the beginning of training
𝑦 = log(1 − 𝑥)
The gradient is relatively small at D(G(z))=0
At the beginning of training, the discriminator can clearly classify the
generated image as fake because the quality of the image is very low.
This means that D(G(z)) is almost zero at early stages of training.
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )

33
Solution for Poor Gradient
𝐺
𝑚𝑎𝑥 𝐸𝑧~𝑝 𝑧(𝑧) log 𝐷(𝐺 𝑧 )
𝐺
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
Non-Saturating Game GANs
𝑦 = log(𝑥)
The gradient is very large at D(G(z))=0
Use binary cross entropy loss function with fake label (1)
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧)[−𝑦 log 𝐷(𝐺 𝑧 ) − 1 − 𝑦 log(1 − 𝐷(𝐺 𝑧 )]
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧)[− log 𝐷 𝐺 𝑧 ]
𝐺
𝐺
𝑦 = 1
Modification (heuristically motivated)
• Practical Usage

34
Theory in GAN GANs
𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
𝑚𝑖𝑛 𝐽𝑆𝐷(𝑝 𝑑𝑎𝑡𝑎||𝑝 𝑔)
𝐺, 𝐷
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑥
𝑝 𝑔(𝑥)
𝐽𝑆𝐷(𝑃||𝑄) =
1
2
𝐾𝐿(𝑃| 𝑀 +
1
2
𝐾𝐿(𝑄| 𝑀
𝑤ℎ𝑒𝑟𝑒 𝑀 =
1
2
(𝑃 + 𝑄) KL Divergence
• Why does GANs work?
same
Please see Appendix for details
Because it actually minimizes the distance between the real data distribution and the model distribution.
Objective function of GANs Jenson-Shannon divergence

36
DCGAN Variants of
GAN
Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
• Deep Convolutional GAN(DCGAN), 2015
The authors present a model that is still highly preferred.

37
DCGAN Variants of
GAN
DGz D(G(z))
D D(x)
x
Use convolution, Leaky ReLU
Use deconvolution, ReLU
• No pooling layer (Instead strided convolution)
• Use batch normalization
• Adam optimizer(lr=0.0002, beta1=0.5, beta2=0.999)

38
DCGAN Variants of
GAN
Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
• Latent vector arithmetic

39
LSGAN Variants of
GAN
Xudong Mao et al. Least Squares Generative Adversarial Networks, 2016
• Least Squares GAN (LSGAN)
Proposed a GAN model that adopts the least squares loss function for the discriminator.

40
LSGAN Variants of
GAN
Vanilla GAN LSGAN
Remove sigmoid
non-linearity
in last layer

41
LSGAN Variants of
GAN
Vanilla GAN LSGAN
Generator is the
same as original

42
LSGAN Variants of
GAN
Vanilla GAN LSGAN
Replace cross entropy loss
to least squares loss (L2)
D(x) gets closer to 1
(same as original)

43
LSGAN Variants of
GAN
Vanilla GAN LSGAN
Replace cross entropy loss
to least squares loss (L2)
(same as original)

44
LSGAN Variants of
GAN
• Results (LSUN dataset)
Xudong Mao et al. Least Squares Generative Adversarial Networks, 2016

45
LSGAN Variants of
GAN
• Results (CelebA)

46
SGAN Variants of
GAN
D
G D
one-hot vector representing 2
Real image
latent vector z
fake image
(1) FC layer with softmax
• Semi-Supervised GAN
11 dimension
(10 classes + fake)
(1)
(1)
one-hot vector representing a fake label
Augustus Odena et al. Semi-Supervised Learning with Generative Adversarial Netwoks, 2016

47
SGAN Variants of
GAN
• Results (Game Character)
1 2 3 4 5
The generator can create an character image that takes a certain pose.
one-hot vectors
representing class labels
The code will be available soon: https://github.com/yunjey

48
ACGAN Variants of
GAN
Augustus Odena et al. Conditional Image Synthesis with Auxilary Classifier, 2016
• Auxiliary Classifier GAN(ACGAN), 2016
Proposed a new method for improved training of GANs using class labels.

49
ACGAN Variants of
GAN
real or fake?
D
G
real or fake?
D
Real image
latent vector z
fake image
(1) FC layer with sigmoid
(2) FC layer with softmax
(1)
(2)
Discriminator
(multi-task learning)
(1)
(2)
• How does it work?

51
CycleGAN Extensions
• CycleGAN: Unpaired Image-to-Image Translation
presents a GAN model that transfer an image from a source domain A to a target
domain B in the absence of paired examples.
Jun-Yan Zhu et al. Unpaired Image-to-Image Translation using Cycle Consistent Adversarial Networks, 2017

52
CycleGAN
real or fake ? DB
GAB
Real Image in domain A Fake Image in domain B
Real Image in domain B
Discriminator for domain B
The generator GAB should generates a horse
from the zebra to deceive the discriminator DB.
Extensions

53
CycleGAN
DB
GAB
Discriminator for domain B
GBA
Reconstructed Image
L2 Loss
GBA generates a reconstructed image of domain A.
This makes the shape to be maintained
when GAB generates a horse image from the zebra.real or fake ?
Real Image in domain A
Real Image in domain B
Fake Image in domain B
Extensions

54
CycleGAN Extensions
• Results
Jun-Yan Zhu et al. Unpaired Image-to-Image Translation using Cycle Consistent Adversarial Networks, 2017

55
CycleGAN Extensions
• Results
MNIST-to-SVHNSVHN-to-MNIST
Odd columns contain real images and even columns contain generated images.
https://github.com/yunjey/mnist-svhn-transfer

56
Text2Image
Scott Reed et al. Generative Adversarial Text to Image Synthesis, 2016
• Generative Adversarial Text to Image Synthesis, 2016
presents a novel model architecture that generates an image from the text.
Extensions

57
Text2Image
D
Sentence Embedding
(100)
Is the image real and
relevant to the sentence?
Concatenate at last conv layer
Discriminator should say ‘yes’.
Real image
(128x128)
• Training with (real image, right text)
Extensions
“A small red bird with a black beak”

58
Text2Image
Fake image
(128x128)
D
G
Z
(100)
Sentence Embedding
(100)
Is the image real and
right text
Discriminator should say ‘no’.
• Training with (fake image, right text)
Generator should creates an
image relevant to the sentence
to deceive the discriminator.
Extensions
“A small red bird with
a black beak”

59
Text2Image
D Is the image real and
wrong text
(sampled randomly from the training data)
Discriminator should say ‘no’.
• Training with (real image, wrong text)
Extensions
Real image
(128x128)
“A small yellow bird with a brown beak”
Sentence Embedding
(100)

60
StackGAN
Han Zhang et al. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, 2016
• StackGAN:Text to Photo-realistic Image Synthesis
Extensions

61
StackGAN
Fake image
(128x128)
Real image
(128x128)
D
G
Z
(100)
‘real’
or
‘fake’
Generates a128x128 image from scratch (not guarantee good result)
Discriminator for 128x128 image
• Generating 128x128 from scratch
Extensions

62
StackGAN Extensions
D1 D2
G1 G2
Fake image
(64x64)
Real image
(64x64)
Fake image
(128x128)
Real image
(128x128)
‘real’
or
‘fake’
Generates a 64x64 image
Upscales a 64x64 image to 128x128 (Easier than generating from scratch).
Discriminator for 64x64 image Discriminator for 128x128 image
• Generating 128x128 from 64x64
‘real’
or
‘fake’
Z
(100)

64
Convergence Measure
• Boundary Equilibrium GAN (BEGAN)
Extensions
David Berthelot et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks, 2017

65
Convergence Measure
• Reconstruction Loss
Extensions
Sitao Xiang. On the effect of Batch Normalization and Weight Normalization in Generative Adversarial Network, 2017
Gz
G(z) x
Test images

66
Better Upsampling
• Deconvolution Checkboard Artifacts
Extensions
http://distill.pub/2016/deconv-checkerboard/

67
Better Upsampling
• Deconvolution vs Resize-Convolution
Extensions
http://distill.pub/2016/deconv-checkerboard/

68
GAN in Supervised Learning
• Machine Translation (Seq2Seq)
Extensions
A B C D
Y
X Y Z<start>
Z <end>X
Should ‘ABCD’ be translated to ‘XYZ’?
Tackling the supervised learning

69
GAN in Supervised Learning
• Machine Translation (GANs)
Extensions
D
G
D
Training with real sentences
Training with fake sentences
A
(English)
B
(Korean)
Does A and B have the same meaning?
The discriminator should say ‘yes’.
Does A and B have the same meaning?
The discriminator should say ‘no’.
A
(English)
B
(Fake Korean)
Lijun Wu et al. Adversarial Neural Machine Translation, 2017
The generator should generate B
that has the same meaning as
A to deceive the discriminator.

72
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
𝐺 𝐷
Theory in GAN GANs
𝑂𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐺𝐴𝑁𝑠 𝑟𝑒𝑎𝑙 𝑑𝑎𝑡𝑎 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
ℎ𝑖𝑔ℎ 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑣𝑒𝑐𝑡𝑜𝑟 (𝑒. 𝑔. 64 × 64) l𝑜𝑤 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑣𝑒𝑐𝑡𝑜𝑟 (𝑒. 𝑔. 100)

73
𝐺 𝐷
Theory in GAN GANs
𝐷∗(𝑥) = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑉 𝐷 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
𝐹𝑖𝑥 𝐺 𝑡𝑜 𝑚𝑎𝑘𝑒 𝑖𝑡 𝑡𝑜 𝑎 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷
𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝐷
𝐷
𝐺𝑒𝑡 𝐷 𝑤ℎ𝑒𝑛 𝑉 𝐷 𝑖𝑠 𝑚𝑎𝑥𝑖𝑚𝑢𝑚

74
𝐺 𝐷
Theory in GAN GANs
= 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸 𝑥~𝑝 𝑔(𝑥) log(1 − 𝐷(𝑥))
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑥 𝑓𝑟𝑜𝑚 𝑝 𝑔
𝑖𝑛𝑠𝑡𝑒𝑎𝑑 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑧 𝑓𝑟𝑜𝑚 𝑝𝑧
𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 64 × 64 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛
𝑣𝑒𝑐𝑡𝑜𝑟 𝑜𝑓 100 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛
𝑚𝑜𝑑𝑒𝑙 𝐺 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 ℎ𝑖𝑔ℎ 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑣𝑒𝑐𝑡𝑜𝑟 (𝑒. 𝑔. 64 × 64)
𝐷

75
𝐺 𝐷
Theory in GAN GANs
= න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log 𝐷(𝑥) 𝑑𝑥 + න 𝑝 𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥
𝑥 𝑥
= න 𝑝 𝑥 𝑓 𝑥 𝑑𝑥𝐸 𝑥~𝑝(𝑥) 𝑓(𝑥)
𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝐸𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛
𝑥
𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑒 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑥
𝐷

76
𝐺 𝐷
Theory in GAN GANs
= න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝 𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥
𝑥
𝑥 𝑥
𝑥
𝐵𝑎𝑠𝑖𝑐 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑜𝑓 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑙
𝐷

77
𝐺 𝐷
Theory in GAN GANs
= න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝 𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥
𝑥
𝑥 𝑥
𝑥
𝐵𝑎𝑠𝑖𝑐 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑜𝑓 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑙
𝑁𝑜𝑤 𝑤𝑒 𝑛𝑒𝑒𝑑 𝑡𝑜 𝑓𝑖𝑛𝑑 𝐷 𝑥 𝑤ℎ𝑖𝑐ℎ 𝑚𝑎𝑘𝑒𝑠 𝑡ℎ𝑒 𝑓𝑢𝑛𝑡𝑖𝑜𝑛 𝑖𝑛𝑠𝑖𝑑𝑒 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑙 𝑚𝑎𝑥𝑖𝑚𝑢𝑚.
𝐷

78
Theoretical Results
= 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑝 𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝 𝑔 𝑥 log 1 − 𝐷 𝑥
Theory in GAN GANs
𝐷∗(𝑥) = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑉 𝐷
𝐷
𝐷
𝑇ℎ𝑒 𝑓𝑢𝑛𝑡𝑖𝑜𝑛 𝑖𝑛𝑠𝑖𝑑𝑒 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑙

79
Theoretical Results
Theory in GAN GANs
𝐷
𝐷
𝑎 log 𝑦 + 𝑏 log 1 − 𝑦
𝑆𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑒 𝑎 = 𝑝 𝑑𝑎𝑡𝑎 𝑥 , 𝑦 = 𝐷 𝑥 , 𝑏 = 𝑝 𝑔 𝑥

80
Theoretical Results
Theory in GAN GANs
𝐷
𝐷
𝑎
𝑦
+
−𝑏
1 − 𝑦
=
𝑎 − (𝑎 + 𝑏)𝑦
𝑦(1 − 𝑦)
𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑡𝑒 𝑤𝑖𝑡ℎ 𝑟𝑒𝑠𝑝𝑒𝑐𝑡 𝑡𝑜 𝐷 𝑥 𝑢𝑠𝑖𝑛𝑔
𝑁𝑜𝑡𝑒 𝑡ℎ𝑎𝑡 𝐷 𝑥 𝑐𝑎𝑛 𝑛𝑜𝑡 𝑎𝑓𝑓𝑒𝑐𝑡 𝑡𝑜 𝑝 𝑑𝑎𝑡𝑎 𝑥 𝑎𝑛𝑑 𝑝 𝑔 𝑥 .
𝑑
𝑑𝑥
log 𝑓(𝑥) =
𝑓′
(𝑥)
𝑓(𝑥)

81
Theoretical Results
Theory in GAN GANs
𝐷
𝐷
𝑎
𝑦
+
−𝑏
1 − 𝑦
𝑎 − (𝑎 + 𝑏)𝑦
𝑦(1 − 𝑦)
= 0
𝑦 =
𝑎
𝑎 + 𝑏
=
𝑎 − (𝑎 + 𝑏)𝑦
𝑦(1 − 𝑦)
𝐼𝑡 ℎ𝑎𝑠 𝑎 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 𝑤ℎ𝑒𝑛
𝑁𝑜𝑡𝑒 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑙𝑜𝑐𝑎𝑙 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑖𝑠 𝑡ℎ𝑒 𝑔𝑙𝑜𝑏𝑎𝑙 𝑚𝑎𝑥𝑖𝑚𝑢𝑚
𝑤ℎ𝑒𝑛 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑙𝑜𝑐𝑎𝑙 𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝑠.
𝑑
𝑑𝑥
log 𝑓(𝑥) =
𝑓′
(𝑥)
𝑓(𝑥)
𝐹𝑖𝑛𝑑 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 0 𝑙𝑜𝑐𝑎𝑙 𝑒𝑥𝑡𝑟𝑒𝑚𝑒 .

82
Theoretical Results
Theory in GAN GANs
𝐷
𝐷
𝑎
𝑦
+
−𝑏
1 − 𝑦
𝑎 − (𝑎 + 𝑏)𝑦
𝑦(1 − 𝑦)
= 0
𝑦 =
𝑎
𝑎 + 𝑏
=
𝑎 − (𝑎 + 𝑏)𝑦
𝑦(1 − 𝑦)
𝐼𝑡 ℎ𝑎𝑠 𝑎 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 𝑤ℎ𝑒𝑛
𝑑
𝑑𝑥
log 𝑓(𝑥) =
𝑓′
(𝑥)
𝑓(𝑥)
𝐹𝑖𝑛𝑑 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 𝑣𝑎𝑙𝑢𝑒 𝑖𝑠 0 𝑙𝑜𝑐𝑎𝑙 𝑒𝑥𝑡𝑟𝑒𝑚𝑒 .
𝐷∗ 𝑥 =
𝑝 𝑑𝑎𝑡𝑎(𝑥) + 𝑝 𝑔(𝑥)

83
Theoretical Results
𝐶 𝐺 = max 𝑉(𝐷, 𝐺)
𝐷
Objective Function of G
(G should minimize C(G)
Theory in GAN GANs

84
Theoretical Results
𝐷
= 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎
log 𝐷∗ 𝑥 + 𝐸 𝑥~𝑝 𝑔
log(1 − 𝐷∗(𝑥))
Theory in GAN GANs
Optimal discriminator

85
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
+ 𝐸 𝑥~𝑝 𝑔
log
𝑝 𝑔(𝑥)
Theory in GAN GANs

86
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
= න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
𝑑𝑥 + න 𝑝 𝑔 𝑥 log
𝑝 𝑔(𝑥)
𝑑𝑥
Theory in GAN GANs
𝑥 𝑥
Definition of Expectation
𝑥

87
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
= −𝑙𝑜𝑔4 + 𝑙𝑜𝑔4 + න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
𝑝 𝑔(𝑥)
𝑑𝑥
Theory in GAN GANs
𝑥 𝑥

88
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
= −𝑙𝑜𝑔4 + න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
2 ∙ 𝑝 𝑑𝑎𝑡𝑎(𝑥)
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
Theory in GAN GANs
𝑥 𝑥
𝑥 𝑥
How?

89
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
න 𝑝 𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 = 1
Theory in GAN GANs
𝑥
Property of probability density function

90
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
න 𝑝 𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 = 1 𝑙𝑜𝑔2 = න 𝑝 𝑑𝑎𝑡𝑎 𝑥 ∙ 𝑙𝑜𝑔2 𝑑𝑥
Theory in GAN GANs
𝑥 𝑥trick

91
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
න 𝑝 𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 = 1 𝑙𝑜𝑔2 = න 𝑝 𝑑𝑎𝑡𝑎 𝑥 ∙ 𝑙𝑜𝑔2 𝑑𝑥
𝑙𝑜𝑔2 + න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
𝑑𝑥 = න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
𝑑𝑥
Theory in GAN GANs
𝑥 𝑥
𝑥 𝑥

92
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
Theory in GAN GANs
𝑥 𝑥
𝑥𝑥

93
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
= −𝑙𝑜𝑔4 + 𝐾𝐿(𝑝 𝑑𝑎𝑡𝑎||
𝑝 𝑑𝑎𝑡𝑎 + 𝑝 𝑔
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
Theory in GAN GANs
𝑥𝑥

94
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
𝐾𝐿(𝑃| 𝑄 = න 𝑃(𝑥) log
𝑃(𝑥)
𝑄(𝑥)
𝑑𝑥
Theory in GAN GANs
𝑥
𝑥 𝑥
Definition of KL-Divergence

95
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
𝐾𝐿(𝑃| 𝑄 = න 𝑃(𝑥) log
𝑃(𝑥)
𝑄(𝑥)
𝑑𝑥 න 𝑝 𝑑𝑎𝑡𝑎 𝑥 log
2
𝑑𝑥 = 𝐾𝐿(𝑝 𝑑𝑎𝑡𝑎||
2
)
Theory in GAN GANs
𝑥
𝑥 𝑥
𝑥

96
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
= −𝑙𝑜𝑔4 + 2 ∙ 𝐽𝑆𝐷(𝑝 𝑑𝑎𝑡𝑎||𝑝 𝑔)
Theory in GAN GANs

97
Theoretical Results
𝐷
log(1 − 𝐷∗(𝑥))
log
log
𝑝 𝑔(𝑥)
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
Theory in GAN GANs

98
Theory in GAN GANs
log 𝐷∗
𝑥 + 𝐸 𝑥~𝑝 𝑔
log(1 − 𝐷∗
(𝑥))
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
𝑥 𝑥
𝑥 𝑥
𝑥𝑥
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺 =
𝐺 𝐷
𝑚𝑖𝑛 𝑉 𝐷∗, 𝐺
𝐺
𝑉 𝐷∗, 𝐺

99
Theory in GAN GANs
log 𝐷∗
𝑥 + 𝐸 𝑥~𝑝 𝑔
log(1 − 𝐷∗
(𝑥))
𝑝 𝑔(𝑥)
𝑑𝑥
𝑝 𝑔(𝑥)
𝑑𝑥
2 ∙ 𝑝 𝑔(𝑥)
𝑑𝑥
2
) + 𝐾𝐿(𝑝 𝑔||
2
)
𝑥 𝑥
𝑥 𝑥
𝑥𝑥
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺 =
𝐺 𝐷
𝑚𝑖𝑛 𝑉 𝐷∗, 𝐺
𝐺
𝑉 𝐷∗, 𝐺
𝐺 𝑠ℎ𝑜𝑢𝑙𝑑 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝐺 𝑠ℎ𝑜𝑢𝑙𝑑 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑖𝑛𝑔 𝑉 𝐷, 𝐺 𝑖𝑠 𝑠𝑎𝑚𝑒 𝑎𝑠 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑖𝑛𝑔 𝐽𝑆𝐷(𝑝 𝑑𝑎𝑡𝑎||𝑝 𝑔)
𝑂𝑝𝑡𝑖𝑚𝑎𝑙 𝐷

Generative adversarial networks

In this document

More Related Content

What's hot

Viewers also liked

Similar to Generative adversarial networks

Recently uploaded

Generative adversarial networks