Generative Adversarial Networks (GANs) are a type of deep learning model used for unsupervised machine learning tasks like image generation. GANs work by having two neural networks, a generator and discriminator, compete against each other. The generator creates synthetic images and the discriminator tries to distinguish real images from fake ones. This allows the generator to improve over time at creating more realistic images that can fool the discriminator. The document discusses the intuition behind GANs, provides a PyTorch implementation example, and describes variants like DCGAN, LSGAN, and semi-supervised GANs.
2
Speaker Introduction Introduction
B.S.in Computer Science & Engineering at Korea University
M.S. Student in Computer Science & Engineering at Korea University (Current)
Interest: Deep Learning,TensorFlow, PyTorch
GitHub Link: https://github.com/yunjey
3.
3
Referenced Slides Introduction
•Namju Kim. Generative Adversarial Networks (GAN)
https://www.slideshare.net/ssuser77ee21/generative-adversarial-networks-70896091
• Taehoon Kim. 지적 대화를 위한 깊고 넓은 딥러닝
https://www.slideshare.net/carpedm20/ss-63116251
5
Branches of MLIntroduction
Semi-supervised
Learning Unsupervised
Learning
Supervised
Learning
Reinforcement
Learning
Machine Learning
No labeled data
No feedback
“find hidden structure”
Labeled data
Direct feedback
No labeled data
Delayed feedback
Reward signal
8
Generative Model
𝑋 12 3 4 5 6
𝑃 𝑋
1
6
1
6
1
6
0
6
1
6
2
6
1 2 3 4 5 6
1
6
2
6
𝑝 𝑥
𝑥
Probability Distribution Introduction
Random variable
Probability Basics (Review)
Probability mass function
9.
9
Generative Model
What if𝑥 is actual images in the training data?
At this point, 𝑥 can be represented as a (for example) 64x64x3 dimensional vector.
Probability Distribution Introduction
11
Generative Model
𝑥
𝑝 𝑑𝑎𝑡𝑎(𝑥)
ProbabilityDistribution Introduction
Let’s take an example with human face image dataset.
Our dataset may contain few images of men with glasses.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥1 is a 64x64x3 high dimensional vector
representing a man with glasses.
The probability
density value is low
12.
The probability
density valueis high
12
Generative Model
𝑥
𝑝 𝑑𝑎𝑡𝑎(𝑥)
Probability Distribution Introduction
Our dataset may contain many images of women with black hair.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥2 is a 64x64x3 high dimensional vector
representing a woman with black hair.
13.
13
Generative Model
𝑥
𝑝 𝑑𝑎𝑡𝑎(𝑥)
ProbabilityDistribution Introduction
Our dataset may contain very many images of women with blonde hair.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥3 is a 64x64x3 high dimensional vector
representing a woman with blonde hair.
The probability
density value is
very high
14.
14
Generative Model
𝑥
𝑝 𝑑𝑎𝑡𝑎(𝑥)
ProbabilityDistribution Introduction
Our dataset may not contain these strange images.
𝑥1 𝑥2 𝑥3 𝑥4
𝑥4 is an 64x64x3 high dimensional vector
representing very strange images.
The probability
density value is
almost 0
15.
15
Generative Model
𝑥
Probability DistributionIntroduction
The goal of the generative model is to find a 𝑝 𝑚𝑜𝑑𝑒𝑙(𝑥) that
approximates 𝑝 𝑑𝑎𝑡𝑎(𝑥) well.
𝑥1 𝑥2 𝑥3 𝑥4
𝑝 𝑚𝑜𝑑𝑒𝑙(𝑥)
𝑝 𝑑𝑎𝑡𝑎(𝑥)
𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑖𝑚𝑎𝑔𝑒𝑠
𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙
17
Intuition in GANGANs
G(z)
DGz D(G(z))
D D(x)
x
Fake image
Real image
The probability of that
x came from the real data
(0~1)Discriminator
Generator
Latent Code
May be high
May be low
Training with real images
Training with fake images
18.
18
Intuition in GANGANs
G(z)
DGz D(G(z))
D D(x)
x
Real image
(64x64x3)
This value should
be close to 1.Discriminator
(Neural Network)
The discriminator should classify
a real image as real.
19.
19
Intuition in GANGANs
G(z)
Dz D(G(z))
D D(x)
x
Fake image generated by the generator
(64x64x3)
Generator
This value should
be close to 0.
The discriminator should classify
a fake image as fake.
20.
20
Intuition in GANGANs
G(z)
DGz D(G(z))
D D(x)
x
Generated image
(64x64x3)
Generator
(Neural Network)
Latent Code
(100)
This value should
be close to 1.
The generator should create an image
that is indistinguishable from real to
deceive the discriminator
21.
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉𝐷, 𝐺 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺(𝑧)))
21
Objective Function of GAN GANs
G(z)
DGz D(G(z))
D D(x)
x
𝐺 𝐷
Training with real images
Training with fake images
Maximum when 𝐷(𝑥) = 1 Maximum when 𝐷(𝐺(𝑧)) = 0
Sample 𝑥 from real data distribution Sample latent code 𝑧 from Gaussian distribution
Train D to classify fake images as fake
Train D to classify real images as real
𝐷 should maximize 𝑉(𝐷, 𝐺)
Objective function
22.
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉𝐷, 𝐺 = 𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺(𝑧)))
𝐺 is independent of this part
22
Objective Function of GAN GANs
𝐺 should minimize 𝑉(𝐷, 𝐺) Minimum when 𝐷(𝐺(𝑧)) = 1
G(z)
DGz D(G(z))
D D(x)
x
Train G to deceive D
𝐺 𝐷
Training with real images
Training with fake images
Objective function
29
PyTorch Implementation GANs
G(z)DGz D(G(z))
D D(x)x
D(x) gets closer to 1.
D(G(z)) gets closer to 0
Forward, Backward and Gradient Descent
Train the discriminator
with real images
Train the discriminator
with fake images
30.
30
PyTorch Implementation GANs
G(z)DGz D(G(z))
D D(x)x
Forward, Backward and Gradient Descent
D(G(z)) gets closer to 1
Train the generator
to deceive the discriminator
32
Objective Function ofGenerator
𝐺
Objective function of G
Non-Saturating Game GANs
Images created by the generator
at the beginning of training
𝑦 = log(1 − 𝑥)
The gradient is relatively small at D(G(z))=0
At the beginning of training, the discriminator can clearly classify the
generated image as fake because the quality of the image is very low.
This means that D(G(z)) is almost zero at early stages of training.
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
33.
33
Solution for PoorGradient
𝐺
𝑚𝑎𝑥 𝐸𝑧~𝑝 𝑧(𝑧) log 𝐷(𝐺 𝑧 )
𝐺
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
Non-Saturating Game GANs
𝑦 = log(𝑥)
The gradient is very large at D(G(z))=0
Use binary cross entropy loss function with fake label (1)
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧)[−𝑦 log 𝐷(𝐺 𝑧 ) − 1 − 𝑦 log(1 − 𝐷(𝐺 𝑧 )]
𝑚𝑖𝑛 𝐸𝑧~𝑝 𝑧(𝑧)[− log 𝐷 𝐺 𝑧 ]
𝐺
𝐺
𝑦 = 1
Modification (heuristically motivated)
• Practical Usage
34.
34
Theory in GANGANs
𝐸 𝑥~𝑝 𝑑𝑎𝑡𝑎(𝑥) log 𝐷 𝑥 + 𝐸𝑧~𝑝 𝑧(𝑧) log(1 − 𝐷(𝐺 𝑧 )
𝑚𝑖𝑛 𝐽𝑆𝐷(𝑝 𝑑𝑎𝑡𝑎||𝑝 𝑔)
𝐺, 𝐷
𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉 𝐷, 𝐺
𝐺 𝐷
𝑥
𝑝 𝑔(𝑥)
𝑝 𝑑𝑎𝑡𝑎(𝑥)
𝐽𝑆𝐷(𝑃||𝑄) =
1
2
𝐾𝐿(𝑃| 𝑀 +
1
2
𝐾𝐿(𝑄| 𝑀
𝑤ℎ𝑒𝑟𝑒 𝑀 =
1
2
(𝑃 + 𝑄) KL Divergence
• Why does GANs work?
same
Please see Appendix for details
Because it actually minimizes the distance between the real data distribution and the model distribution.
Objective function of GANs Jenson-Shannon divergence
36
DCGAN Variants of
GAN
Radfordet al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
• Deep Convolutional GAN(DCGAN), 2015
The authors present a model that is still highly preferred.
37.
37
DCGAN Variants of
GAN
DGzD(G(z))
D D(x)
x
Use convolution, Leaky ReLU
Use deconvolution, ReLU
• No pooling layer (Instead strided convolution)
• Use batch normalization
• Adam optimizer(lr=0.0002, beta1=0.5, beta2=0.999)
38.
38
DCGAN Variants of
GAN
Radfordet al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
• Latent vector arithmetic
39.
39
LSGAN Variants of
GAN
XudongMao et al. Least Squares Generative Adversarial Networks, 2016
• Least Squares GAN (LSGAN)
Proposed a GAN model that adopts the least squares loss function for the discriminator.
42
LSGAN Variants of
GAN
VanillaGAN LSGAN
Replace cross entropy loss
to least squares loss (L2)
D(x) gets closer to 1
D(G(z)) gets closer to 0
(same as original)
43.
43
LSGAN Variants of
GAN
VanillaGAN LSGAN
Replace cross entropy loss
to least squares loss (L2)
D(G(z)) gets closer to 1
(same as original)
44.
44
LSGAN Variants of
GAN
•Results (LSUN dataset)
Xudong Mao et al. Least Squares Generative Adversarial Networks, 2016
46
SGAN Variants of
GAN
D
GD
one-hot vector representing 2
Real image
latent vector z
fake image
(1) FC layer with softmax
• Semi-Supervised GAN
Training with real images
Training with fake images
11 dimension
(10 classes + fake)
(1)
(1)
one-hot vector representing a fake label
one-hot vector representing 5
Augustus Odena et al. Semi-Supervised Learning with Generative Adversarial Netwoks, 2016
47.
47
SGAN Variants of
GAN
•Results (Game Character)
1 2 3 4 5
The generator can create an character image that takes a certain pose.
one-hot vectors
representing class labels
The code will be available soon: https://github.com/yunjey
48.
48
ACGAN Variants of
GAN
AugustusOdena et al. Conditional Image Synthesis with Auxilary Classifier, 2016
• Auxiliary Classifier GAN(ACGAN), 2016
Proposed a new method for improved training of GANs using class labels.
49.
49
ACGAN Variants of
GAN
realor fake?
D
G
real or fake?
D
one-hot vector representing 2
Real image
one-hot vector representing 5
latent vector z
fake image
one-hot vector representing 2
(1) FC layer with sigmoid
(2) FC layer with softmax
(1)
(2)
Discriminator
(multi-task learning)
(1)
(2)
• How does it work?
Training with real images
Training with fake images
51
CycleGAN Extensions
• CycleGAN:Unpaired Image-to-Image Translation
presents a GAN model that transfer an image from a source domain A to a target
domain B in the absence of paired examples.
Jun-Yan Zhu et al. Unpaired Image-to-Image Translation using Cycle Consistent Adversarial Networks, 2017
52.
52
CycleGAN
• How doesit work?
real or fake ? DB
GAB
Real Image in domain A Fake Image in domain B
Real Image in domain B
Discriminator for domain B
The generator GAB should generates a horse
from the zebra to deceive the discriminator DB.
Extensions
53.
53
CycleGAN
• How doesit work?
DB
GAB
Discriminator for domain B
GBA
Reconstructed Image
L2 Loss
GBA generates a reconstructed image of domain A.
This makes the shape to be maintained
when GAB generates a horse image from the zebra.real or fake ?
Real Image in domain A
Real Image in domain B
Fake Image in domain B
Extensions
56
Text2Image
Scott Reed etal. Generative Adversarial Text to Image Synthesis, 2016
• Generative Adversarial Text to Image Synthesis, 2016
presents a novel model architecture that generates an image from the text.
Extensions
57.
57
Text2Image
D
Sentence Embedding
(100)
Is theimage real and
relevant to the sentence?
Concatenate at last conv layer
Discriminator should say ‘yes’.
Real image
(128x128)
• Training with (real image, right text)
Extensions
“A small red bird with a black beak”
58.
58
Text2Image
Fake image
(128x128)
D
G
Z
(100)
Sentence Embedding
(100)
Isthe image real and
relevant to the sentence?
right text
Discriminator should say ‘no’.
• Training with (fake image, right text)
Generator should creates an
image relevant to the sentence
to deceive the discriminator.
Extensions
“A small red bird with
a black beak”
59.
59
Text2Image
D Is theimage real and
relevant to the sentence?
wrong text
(sampled randomly from the training data)
Discriminator should say ‘no’.
• Training with (real image, wrong text)
Extensions
Real image
(128x128)
“A small yellow bird with a brown beak”
Sentence Embedding
(100)
60.
60
StackGAN
Han Zhang etal. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, 2016
• StackGAN:Text to Photo-realistic Image Synthesis
Extensions
62
StackGAN Extensions
D1 D2
G1G2
Fake image
(64x64)
Real image
(64x64)
Fake image
(128x128)
Real image
(128x128)
‘real’
or
‘fake’
Generates a 64x64 image
Upscales a 64x64 image to 128x128 (Easier than generating from scratch).
Discriminator for 64x64 image Discriminator for 128x128 image
• Generating 128x128 from 64x64
‘real’
or
‘fake’
Z
(100)
64
Convergence Measure
• BoundaryEquilibrium GAN (BEGAN)
Extensions
David Berthelot et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks, 2017
65.
65
Convergence Measure
• ReconstructionLoss
Extensions
Sitao Xiang. On the effect of Batch Normalization and Weight Normalization in Generative Adversarial Network, 2017
Gz
G(z) x
Test images
68
GAN in SupervisedLearning
• Machine Translation (Seq2Seq)
Extensions
A B C D
Y
X Y Z<start>
Z <end>X
Should ‘ABCD’ be translated to ‘XYZ’?
Tackling the supervised learning
69.
69
GAN in SupervisedLearning
• Machine Translation (GANs)
Extensions
D
G
D
Training with real sentences
Training with fake sentences
A
(English)
B
(Korean)
Does A and B have the same meaning?
The discriminator should say ‘yes’.
Does A and B have the same meaning?
The discriminator should say ‘no’.
A
(English)
B
(Fake Korean)
Lijun Wu et al. Adversarial Neural Machine Translation, 2017
The generator should generate B
that has the same meaning as
A to deceive the discriminator.