Generative Adversarial Networks
Aaron Mishkin
UBC MLRG 2018W2
1
Generative Adversial Networks
“Two imaginary celebrities that were dreamed up by a random
number generator.”
https://research.nvidia.com/publication/2017-10 Progressive-Growing-of
2
Why care about GANs?
Why to spend your limited time learning about GANs:
• GANs are achieving state-of-the-art results in a large variety
of image generation tasks.
• There’s been a veritable explosion in GAN publications over
the last few years – many people are very excited!
• GANs are stimulating new theoretical interest in min-max
optimization problems and “smooth games”.
3
Why care about GANs: Hyper-realistic Image Generation
StyleGAN: image generatation with hierarchical style transfer [3].
https://arxiv.org/abs/1812.04948 4
Why care about GANs: Conditionally Generative Models
Conditional GANs: high-resolution image synthesis via semantic
labeling [8].
Input: Segmentation Output: Synthesized Image
https://research.nvidia.com/publication/2017-12 High-Resolution-Image-Synthesis
5
Why care about GANs: Image Super Resolution
SRGAN: Photo-realistic super-resolution [4].
Bicubic Interp. SRGAN Original Image
https://arxiv.org/abs/1609.04802
6
Why care about GANs: Publications
Approximately 500 papers GAN papers as of September 2018!
See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers. 7
Generative Models
Generative Modeling
Generative Models estimate the probabilistic process that
generated a set of observations D.
• D =

xi , yi
 n
i=1
: supervised generative models learn the
joint distribution p(xi , yi ), often to compute p(yi | xi ).
• D =

xi n
i=1
: unsupervised generative models learn the
distribution of D for clustering, sampling, etc. We can:
• directly estimate p(xi
),
• introducing latents yi
and estimate p(xi
, yi
).
8
Generative Modeling: Unsupervised Parametric Approaches
• Direct Estimation: Choose a parameterized family p(x | θ)
and learn θ by maximizing the log-likelihood
θ∗
= arg max θ
n
X
i=1
log p(xi
| θ).
• Latent Variable Models: Define a joint distribution
p(x, y | θ) and learn θ by maximizing the log-marginal
likelihood
θ∗
= arg max θ
n
X
i=1
log
Z
zi
p(xi
, zi
| θ)dz.
Both approaches require that p(x | θ) is easy to evaluate.
9
Generative Modeling: Models for (Very) Complex Data
How can we learn such models for very complex data?
https://www.researchgate.net/figure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10-
10
Generative Modeling: Normalizing Flows and VAEs
Design parameterized densities with huge capacity!
• Normalizing flows: sequence of non-linear transformations to
a simple distribution pz(z)
p(x | θ0:k) = pz(z) where z = f −1
θk
◦ · · · ◦ f −1
θ1
◦ f −1
θ0
(x) .
f −1
θj
must be invertible with tractable log-det. Jacobians.
• VAEs: latent-variable models where inference networks
specify parameters
p(x, y | θ) = p(x | fθ(y))py(y).
The marginal likelihood is maximized via the ELBO.
11
GANs
GANs: Density-Free Models
Generative Adversial Networks (GANs) instead use an
unrestricted generator Gθg (z) such that
p(x | θg ) = pz({z}) where {z} = G−1
θg
(x).
• Problem: the inverse image of Gθg (z) may be huge!
• Problem: it’s likely intractable to preserve volume through
G(z; θg ).
So, we can’t evaluate p(x | θg ) and we can’t learn θg by maximum
likelihood.
12
GANs: Discriminators
GANs learn by comparing model samples with examples from D.
• Sampling from the generator is easy:
x̂ = Gθg (ẑ), where ẑ ∼ pz(z).
• Given a sample x̂, a discriminator tries to distinguish it from
true examples:
D(x) = Pr (x ∼ pdata) .
• The discriminator “supervises” the generator network.
13
GANs: Generator + Descriminator
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-
training-upc-2016
14
GANs: Goodfellow et al. (2014)
• Let z ∈ Rm and pz(z) be a simple base distribution.
• The generator Gθg (z) : Rm → D̃ is a deep neural network.
• D̃ is the manifold of generated examples.
• The discriminator Dθd
(x) : D ∪ D̃ → (0, 1) is also a deep
neural network.
https://arxiv.org/abs/1511.06434
15
GANs: Saddle-Point Optimization
Saddle-Point Optimization: learn Gθg (z) and Dθd
(x) jointly via
the objective V (θd , θg ):
min
θg
max
θd
Epdata
[log Dθd
(x)]
| {z }
likelihood of true data
+ Epz(z)

log 1 − Dθd
(Gθg (z))

| {z }
likelihood of generated data
16
GANs: Optimal Discriminators
Claim: Given Gθg defining an implicit distribution pg = p(x | θg ),
the optimal descriminator is
D∗
(x) =
pdata(x)
pdata(x) + pg(x)
.
Proof Sketch:
V (θd , θg ) =
Z
D
pdata(x) log D(x)dx +
Z
D̃
p(z) log(1 − D(Gθg (z)))dz
=
Z
D∪D̃
pdata(x) log D(x) + pg (x) log(1 − D(x))dx
Maximizing the integrand for all x is sufficient and gives the result
(see bonus slides).
Previous Slide: https://commons.wikimedia.org/wiki/File:Saddle point.svg
17
GANs: Jensen-Shannon Divergence and Optimal Generators
Given an optimal discriminator D∗(x), the generator objective is
C(θg ) = Epdata

log D∗
θd
(x)

+ Epg (x)

log 1 − D∗
θd
(x)

= Epdata

log
pdata(x)
pdata(x) + pg(x)

+ Epg (x)

log
pg (x)
pdata(x) + pg(x)

∝
1
2
KL

pdata
(pdata + pg )
2

+
1
2
KL

pg
(pdata + pg )
2

| {z }
Jensen-Shannon Divergence
C(θg ) achives its global minimum at pg = pdata given an optimal
discriminator!
18
GANs: Learning Generators and Discriminators
Putting these results to use in practice:
• High-capacity discriminators Dθd
approximate the
Jensen-Shannon divergence when close to global maximum.
• Dθd
is a “differentiable program”.
• We can use Dθd
to learn Gθg with our favourite gradient
descent method.
https://arxiv.org/abs/1511.06434
19
GANs: Training Procedure
for i = 1 . . . N do
for k = 1 . . . K do
• Sample noise samples {z1, . . . , zm} ∼ pz(z)
• Sample examples {x1, . . . , xm} from pdata(x).
• Update the discriminator Dθd
:
θd = θd −αd ∇θd
1
m
m
X
i=1

log D xi

+ log 1 − D G zi

.
end for
• Sample noise samples {z1, . . . , zm} ∼ pz(z).
• Update the generator Gθg :
θg = θg − αg ∇θg
1
m
m
X
i=1
log 1 − D G zi

.
end for 20
Problems (c. 2016)
Problems with GANs
• Vanishing gradients: the discriminator becomes ”too good”
and the generator gradient vanishes.
• Non-Convergence: the generator and discriminator oscillate
without reaching an equilibrium.
• Mode Collapse: the generator distribution collapses to a
small set of examples.
• Mode Dropping: the generator distribution doesn’t fully
cover the data distribution.
21
Problems: Vanishing Gradients
• The minimax objective saturates when Dθd
is close to perfect:
V (θd , θg ) = Epdata
[log Dθd
(x)]+Epz(z)

log 1 − Dθd
(Gθg (z))

.
• A non-saturating heuristic objective for the generator is
J(Gθg ) = −Epz(z)

log Dθd
(Gθg (z))

.
https://arxiv.org/abs/1701.00160 22
Problems: Addressing Vanishing Gradients
Solutions:
• Change Objectives: use the non-saturating heuristic
objective, maximum-likelihood cost, etc.
• Limit Discriminator: restrict the capacity of the
discriminator.
• Schedule Learning: try to balance training Dθd
and Gθg .
23
Problems: Non-Convergence
Simultaneous gradient descent is not guaranteed to converge for
minimax objectives.
• Goodfellow et al. only showed convergence when updates are
made in the function space [2].
• The parameterization of Dθd
and Gθg results in highly
non-convex objective.
• In practice, training tends to oscillate – updates “undo” each
other.
24
Problems: Addressing Non-Convergence
Solutions: Lots and lots of hacks!
https://github.com/soumith/ganhacks
25
Problems: Mode Collapse and Mode Dropping
One Explanation: SGD may optimize the max-min objective
max
θd
min
θg
Epdata
[log Dθd
(x)] + Epz(z)

log 1 − Dθd
(Gθg (z))

Intuition: the generator maps all z values to the x̂ that is mostly
likely to fool the discriminator.
https://arxiv.org/abs/1701.00160
26
A Possible Solution
A Possible Solution: Alternative Divergences
There are a large variety of divergence measures for distributions:
• f-Divergences: (e.g. Jensen-Shannon, Kullback-Leibler)
Df (P ||Q) =
Z
χ
q(x)f (
p(x)
q(x)
)dx
• GANs [2], f-GANs [7], and more.
• Integral Probability Metrics: (e.g. Earth Movers Distance,
Maximum Mean Discrepancy)
γF (P ||Q) = sup
f ∈F
Z
fdP −
Z
fdQ
• Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and
more.
27
A Possible Solution: Wasserstein GANs
Wasserstein GANs: Strong theory and excellent empirical results.
• “In no experiment did we see evidence of mode collapse for
the WGAN algorithm.” [1]
https://arxiv.org/abs/1701.07875
28
Summary
Summary
Recap:
• GANs are a class of density-free generative models with
(mostly) unrestricted generator functions.
• Introducing adversial discriminator networks allows GANs to
learn by minimizing the Jensen-Shannon divergence.
• Concurrently learning the generator and discriminator is
challenging due to
• Vanishing Gradients,
• Non-convergence due to oscilliation
• Mode collapse and mode dropping.
• A variety of alternative objective functions are being proposed.
29
Agknowledgements and References
There are lots of excellent references on GANs:
• Sebastian Nowozin’s presentation at MLSS 2018.
• NIPS 2016 tutorial on GANs by Ian Goodfellow.
• A nice explanation of Wasserstein GANs by Alex Irpan.
30
Bonus: Optimal Discriminators Cont.
The integrand
h(D(x)) = pdata(x) log D(x) + pg (x) log(1 − D(x))
is concave for D(x) ∈ (0, 1). We take the derivative and compute
a stationary point in the domain:
∂h(D(x))
∂D(x)
=
pdata(x)
D(x)
−
pg (x)
1 − D(x)
= 0
⇒ D(x) =
pdata(x)
pdata(x) + pg(x)
.
This minimizes the integrand over the domain of the discriminator,
completing the proof.
31
References i
Martin Arjovsky, Soumith Chintala, and Léon Bottou.
Wasserstein gan.
arXiv preprint arXiv:1701.07875, 2017.
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.
Generative adversarial networks. arxiv e-prints.
arXiv preprint arXiv:1406.2661, 2014.
Tero Karras, Samuli Laine, and Timo Aila.
A style-based generator architecture for generative adversarial
networks.
arXiv preprint arXiv:1812.04948, 2018.
32
References ii
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, et al.
Photo-realistic single image super-resolution using a generative
adversarial network.
In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 4681–4690, 2017.
Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng.
Sobolev gan.
arXiv preprint arXiv:1711.04894, 2017.
Youssef Mroueh and Tom Sercu.
Fisher gan.
In Advances in Neural Information Processing Systems, pages 2513–2523,
2017.
33
References iii
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka.
f-gan: Training generative neural samplers using variational
divergence minimization.
In Advances in neural information processing systems, pages 271–279,
2016.
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz,
and Bryan Catanzaro.
High-resolution image synthesis and semantic manipulation with
conditional gans.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 8798–8807, 2018.
34

Introduction to Generative Adversarial Network

  • 1.
    Generative Adversarial Networks AaronMishkin UBC MLRG 2018W2 1
  • 2.
    Generative Adversial Networks “Twoimaginary celebrities that were dreamed up by a random number generator.” https://research.nvidia.com/publication/2017-10 Progressive-Growing-of 2
  • 3.
    Why care aboutGANs? Why to spend your limited time learning about GANs: • GANs are achieving state-of-the-art results in a large variety of image generation tasks. • There’s been a veritable explosion in GAN publications over the last few years – many people are very excited! • GANs are stimulating new theoretical interest in min-max optimization problems and “smooth games”. 3
  • 4.
    Why care aboutGANs: Hyper-realistic Image Generation StyleGAN: image generatation with hierarchical style transfer [3]. https://arxiv.org/abs/1812.04948 4
  • 5.
    Why care aboutGANs: Conditionally Generative Models Conditional GANs: high-resolution image synthesis via semantic labeling [8]. Input: Segmentation Output: Synthesized Image https://research.nvidia.com/publication/2017-12 High-Resolution-Image-Synthesis 5
  • 6.
    Why care aboutGANs: Image Super Resolution SRGAN: Photo-realistic super-resolution [4]. Bicubic Interp. SRGAN Original Image https://arxiv.org/abs/1609.04802 6
  • 7.
    Why care aboutGANs: Publications Approximately 500 papers GAN papers as of September 2018! See https://github.com/hindupuravinash/the-gan-zoo for the exhaustive list of papers. 7
  • 8.
  • 9.
    Generative Modeling Generative Modelsestimate the probabilistic process that generated a set of observations D. • D = xi , yi n i=1 : supervised generative models learn the joint distribution p(xi , yi ), often to compute p(yi | xi ). • D = xi n i=1 : unsupervised generative models learn the distribution of D for clustering, sampling, etc. We can: • directly estimate p(xi ), • introducing latents yi and estimate p(xi , yi ). 8
  • 10.
    Generative Modeling: UnsupervisedParametric Approaches • Direct Estimation: Choose a parameterized family p(x | θ) and learn θ by maximizing the log-likelihood θ∗ = arg max θ n X i=1 log p(xi | θ). • Latent Variable Models: Define a joint distribution p(x, y | θ) and learn θ by maximizing the log-marginal likelihood θ∗ = arg max θ n X i=1 log Z zi p(xi , zi | θ)dz. Both approaches require that p(x | θ) is easy to evaluate. 9
  • 11.
    Generative Modeling: Modelsfor (Very) Complex Data How can we learn such models for very complex data? https://www.researchgate.net/figure/Heterogeneousness-and-diversity-of-the-CIFAR-10-entries-in-their-10- 10
  • 12.
    Generative Modeling: NormalizingFlows and VAEs Design parameterized densities with huge capacity! • Normalizing flows: sequence of non-linear transformations to a simple distribution pz(z) p(x | θ0:k) = pz(z) where z = f −1 θk ◦ · · · ◦ f −1 θ1 ◦ f −1 θ0 (x) . f −1 θj must be invertible with tractable log-det. Jacobians. • VAEs: latent-variable models where inference networks specify parameters p(x, y | θ) = p(x | fθ(y))py(y). The marginal likelihood is maximized via the ELBO. 11
  • 13.
  • 14.
    GANs: Density-Free Models GenerativeAdversial Networks (GANs) instead use an unrestricted generator Gθg (z) such that p(x | θg ) = pz({z}) where {z} = G−1 θg (x). • Problem: the inverse image of Gθg (z) may be huge! • Problem: it’s likely intractable to preserve volume through G(z; θg ). So, we can’t evaluate p(x | θg ) and we can’t learn θg by maximum likelihood. 12
  • 15.
    GANs: Discriminators GANs learnby comparing model samples with examples from D. • Sampling from the generator is easy: x̂ = Gθg (ẑ), where ẑ ∼ pz(z). • Given a sample x̂, a discriminator tries to distinguish it from true examples: D(x) = Pr (x ∼ pdata) . • The discriminator “supervises” the generator network. 13
  • 16.
    GANs: Generator +Descriminator https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial- training-upc-2016 14
  • 17.
    GANs: Goodfellow etal. (2014) • Let z ∈ Rm and pz(z) be a simple base distribution. • The generator Gθg (z) : Rm → D̃ is a deep neural network. • D̃ is the manifold of generated examples. • The discriminator Dθd (x) : D ∪ D̃ → (0, 1) is also a deep neural network. https://arxiv.org/abs/1511.06434 15
  • 18.
    GANs: Saddle-Point Optimization Saddle-PointOptimization: learn Gθg (z) and Dθd (x) jointly via the objective V (θd , θg ): min θg max θd Epdata [log Dθd (x)] | {z } likelihood of true data + Epz(z) log 1 − Dθd (Gθg (z)) | {z } likelihood of generated data 16
  • 19.
    GANs: Optimal Discriminators Claim:Given Gθg defining an implicit distribution pg = p(x | θg ), the optimal descriminator is D∗ (x) = pdata(x) pdata(x) + pg(x) . Proof Sketch: V (θd , θg ) = Z D pdata(x) log D(x)dx + Z D̃ p(z) log(1 − D(Gθg (z)))dz = Z D∪D̃ pdata(x) log D(x) + pg (x) log(1 − D(x))dx Maximizing the integrand for all x is sufficient and gives the result (see bonus slides). Previous Slide: https://commons.wikimedia.org/wiki/File:Saddle point.svg 17
  • 20.
    GANs: Jensen-Shannon Divergenceand Optimal Generators Given an optimal discriminator D∗(x), the generator objective is C(θg ) = Epdata log D∗ θd (x) + Epg (x) log 1 − D∗ θd (x) = Epdata log pdata(x) pdata(x) + pg(x) + Epg (x) log pg (x) pdata(x) + pg(x) ∝ 1 2 KL pdata (pdata + pg ) 2 + 1 2 KL pg (pdata + pg ) 2 | {z } Jensen-Shannon Divergence C(θg ) achives its global minimum at pg = pdata given an optimal discriminator! 18
  • 21.
    GANs: Learning Generatorsand Discriminators Putting these results to use in practice: • High-capacity discriminators Dθd approximate the Jensen-Shannon divergence when close to global maximum. • Dθd is a “differentiable program”. • We can use Dθd to learn Gθg with our favourite gradient descent method. https://arxiv.org/abs/1511.06434 19
  • 22.
    GANs: Training Procedure fori = 1 . . . N do for k = 1 . . . K do • Sample noise samples {z1, . . . , zm} ∼ pz(z) • Sample examples {x1, . . . , xm} from pdata(x). • Update the discriminator Dθd : θd = θd −αd ∇θd 1 m m X i=1 log D xi + log 1 − D G zi . end for • Sample noise samples {z1, . . . , zm} ∼ pz(z). • Update the generator Gθg : θg = θg − αg ∇θg 1 m m X i=1 log 1 − D G zi . end for 20
  • 23.
  • 24.
    Problems with GANs •Vanishing gradients: the discriminator becomes ”too good” and the generator gradient vanishes. • Non-Convergence: the generator and discriminator oscillate without reaching an equilibrium. • Mode Collapse: the generator distribution collapses to a small set of examples. • Mode Dropping: the generator distribution doesn’t fully cover the data distribution. 21
  • 25.
    Problems: Vanishing Gradients •The minimax objective saturates when Dθd is close to perfect: V (θd , θg ) = Epdata [log Dθd (x)]+Epz(z) log 1 − Dθd (Gθg (z)) . • A non-saturating heuristic objective for the generator is J(Gθg ) = −Epz(z) log Dθd (Gθg (z)) . https://arxiv.org/abs/1701.00160 22
  • 26.
    Problems: Addressing VanishingGradients Solutions: • Change Objectives: use the non-saturating heuristic objective, maximum-likelihood cost, etc. • Limit Discriminator: restrict the capacity of the discriminator. • Schedule Learning: try to balance training Dθd and Gθg . 23
  • 27.
    Problems: Non-Convergence Simultaneous gradientdescent is not guaranteed to converge for minimax objectives. • Goodfellow et al. only showed convergence when updates are made in the function space [2]. • The parameterization of Dθd and Gθg results in highly non-convex objective. • In practice, training tends to oscillate – updates “undo” each other. 24
  • 28.
    Problems: Addressing Non-Convergence Solutions:Lots and lots of hacks! https://github.com/soumith/ganhacks 25
  • 29.
    Problems: Mode Collapseand Mode Dropping One Explanation: SGD may optimize the max-min objective max θd min θg Epdata [log Dθd (x)] + Epz(z) log 1 − Dθd (Gθg (z)) Intuition: the generator maps all z values to the x̂ that is mostly likely to fool the discriminator. https://arxiv.org/abs/1701.00160 26
  • 30.
  • 31.
    A Possible Solution:Alternative Divergences There are a large variety of divergence measures for distributions: • f-Divergences: (e.g. Jensen-Shannon, Kullback-Leibler) Df (P ||Q) = Z χ q(x)f ( p(x) q(x) )dx • GANs [2], f-GANs [7], and more. • Integral Probability Metrics: (e.g. Earth Movers Distance, Maximum Mean Discrepancy) γF (P ||Q) = sup f ∈F Z fdP − Z fdQ • Wasserstein GANs [1], Fisher GANs [6], Sobolev GANs [5] and more. 27
  • 32.
    A Possible Solution:Wasserstein GANs Wasserstein GANs: Strong theory and excellent empirical results. • “In no experiment did we see evidence of mode collapse for the WGAN algorithm.” [1] https://arxiv.org/abs/1701.07875 28
  • 33.
  • 34.
    Summary Recap: • GANs area class of density-free generative models with (mostly) unrestricted generator functions. • Introducing adversial discriminator networks allows GANs to learn by minimizing the Jensen-Shannon divergence. • Concurrently learning the generator and discriminator is challenging due to • Vanishing Gradients, • Non-convergence due to oscilliation • Mode collapse and mode dropping. • A variety of alternative objective functions are being proposed. 29
  • 35.
    Agknowledgements and References Thereare lots of excellent references on GANs: • Sebastian Nowozin’s presentation at MLSS 2018. • NIPS 2016 tutorial on GANs by Ian Goodfellow. • A nice explanation of Wasserstein GANs by Alex Irpan. 30
  • 36.
    Bonus: Optimal DiscriminatorsCont. The integrand h(D(x)) = pdata(x) log D(x) + pg (x) log(1 − D(x)) is concave for D(x) ∈ (0, 1). We take the derivative and compute a stationary point in the domain: ∂h(D(x)) ∂D(x) = pdata(x) D(x) − pg (x) 1 − D(x) = 0 ⇒ D(x) = pdata(x) pdata(x) + pg(x) . This minimizes the integrand over the domain of the discriminator, completing the proof. 31
  • 37.
    References i Martin Arjovsky,Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017. Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. arxiv e-prints. arXiv preprint arXiv:1406.2661, 2014. Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, 2018. 32
  • 38.
    References ii Christian Ledig,Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017. Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, and Yu Cheng. Sobolev gan. arXiv preprint arXiv:1711.04894, 2017. Youssef Mroueh and Tom Sercu. Fisher gan. In Advances in Neural Information Processing Systems, pages 2513–2523, 2017. 33
  • 39.
    References iii Sebastian Nowozin,Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in neural information processing systems, pages 271–279, 2016. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2018. 34