This document provides an overview of several deep generative models including autoencoders (AE), variational autoencoders (VAE), generative adversarial networks (GAN), adversarial autoencoders (AAE), VAE/GAN hybrid models, and adversarial domain adaptation (ADA). It describes the key objectives and training procedures of each model, such as minimizing reconstruction loss in AEs, matching the encoded latent distribution to a prior in VAEs, and making generated samples indistinguishable from real data for the discriminator in GANs. Graphical representations are also shown to illustrate how these models relate latent and observed variables.
Overview of deep generative models and their relevance in music and audio generation.
Discussion about MuseGAN, a model for composing pop songs using GANs, with references to the authors and publication.
Outline of the presentation covering various deep generative models such as AE, VAE, GAN, AAE, VAE/GAN, and ADA.
Introduction to Autoencoders (AE), emphasizing minimizing reconstruction loss and its components.
Description of VAE including minimizing reconstruction loss and distance between distributions, with details on KL divergence.
Exploration of GANs focused on the training process and the roles of generator and discriminator in minimizing distribution distance.
Comparative analysis of GAN and VAE highlighting their generation qualities, stability, and output characteristics.
Description of AAE architecture focusing on reconstruction loss and adversarial training to distinguish between distributions.
Overview of integrating VAE and GAN concepts, building on representation and training processes.
Clarifications on elements involved in models like VAE and GAN, focusing on latent variables and representation quality.Explanation of ADA's goal to classify unlabeled data in target domains using labeled data from source domains.
Discussion on unifying principles across various generative models and parameters involved in the generative and inference processes.
In-depth comparison of different models including VAE, GAN, InfoGAN, and AAE with regards to approach and performance.
List of references cited throughout the presentation regarding deep generative models and related research.
Introduction to Deep
GenerativeModels
Herman Dong
Music and Audio Computing Lab (MACLab),
Research Center for Information Technology Innovation,
Academia Sinica
2.
MuseGAN
Learn about ourrecent work on using GAN to compose
pop song at https://salu133445.github.io/musegan/
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang and Yi-Hsuan Yang. 2017. MuseGAN: Symbolic-domain Music Generation
and Accompaniment with Multi-track Sequential Generative Adversarial Networks. arXiv preprint arXiv:1709.06298.
3.
Outline
• Brief introductionto deep generative models
• AE (Autoencoder)
• VAE (Variational Autoencoder)
• GAN (Generative Adversarial Networks)
• AAE (Adversarial Autoencoder)
• VAE/GAN
• ADA (Adversarial Domain Adaption)
• Reformulation
• Graphical model representation
• Connection to Wake-sleep Algorithm
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
PP(z) z ε
latentdata
8.
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
≈
9.
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
≈
10.
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
μ σ
≈
11.
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
reconstruction
loss
KL
divergence
latentdata
μ σ
μ
=
z σ
+ ∙
ε
≈
12.
VAE (Variational Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
Loss
PP(z) z ε
testing
(generation)
reconstruction
loss
KL
divergence
latentdata
μ σ
μ
=
z σ
+ ∙
ε
≈
13.
GAN (Generative AdversarialNetwork)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
latent data
14.
GAN (Generative AdversarialNetwork)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
15.
GAN (Generative AdversarialNetwork)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
16.
GAN (Generative AdversarialNetwork)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
latent data
17.
GAN (Generative AdversarialNetwork)
• minimize distance between the distribution of real data and generated
samples
Gz~p(z) G(z)
D
X
1/0
adversarial
training
log(1-D(X))
+
log(D(G(z)))
log(1-D(G(z)))
Make G(z) undistinguishable
from real data for D
Distinguish G(z) being fake
from X being real
testing
(generation)
latent data
18.
GAN vs VAE
•GAN
• Generator aim to fool the discriminator
• Discriminator aim to distinguish generated data from real data
• output images are sharper
• higher diversity, lower stability
• VAE
• Objective: reconstruct real data
• using pixel-to-pixel loss
• output images are more blurred
• lower diversity, higher stability
GAN
VAE
A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
19.
GAN vs VAE
•GAN
• Generator aim to fool the discriminator
• Discriminator aim to distinguish generated data from real data
• output images are sharper
• higher diversity, lower stability
• VAE
• Objective: reconstruct real data
• using pixel-to-pixel loss
• output images are more blurred
• lower diversity, higher stability
GAN
VAE
A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
20.
AAE (Adversarial Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z) z ε
reconstruction
loss
latentdata
≈
Loss
KL
divergence
21.
AAE (Adversarial Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z)
reconstruction
loss
latentdata
≈
D 1/0
adversarial
training
Make Q(X) undistinguishable
from real p(X) for D
Distinguish Q(X) being fake
from p(X) being real
22.
AAE (Adversarial Autoencoder)
•minimize reconstruction loss
• minimize distance between encoded latent distribution and prior distribution
QX Q(X)
p(z)
Loss
PP(z)
reconstruction
loss
latentdata
≈
D 1/0
adversarial
training
Make Q(X) undistinguishable
from real p(X) for D
Distinguish Q(X) being fake
from p(X) being real
VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) zε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
28.
VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) zε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
29.
VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) zε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
X
Make G(z) and P(z)
undistinguishable
from real data
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
30.
VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) zε
KL
divergence
latent
data
≈
G(z) z
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
X
Make G(z) and P(z)
undistinguishable
from real data
VAE/GAN
Q Q(X)
p(z)
Loss
PP(z) zε
KL
divergence
latent
data
≈
G(z) zG
latent
D1/0
adversarial
training
Distinguish G(z) and P(z)
being fake from X being real
𝓛𝓛GAN
Loss
𝓛𝓛DIS hidden representation
reconstruction loss
𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩
(G)
𝓛𝓛 = 𝓛𝓛𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩𝐩 + 𝓛𝓛DIS + 𝓛𝓛GAN
!?
X
ADA (Adversarial DomainAdaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
data feature
Xsrc C class
40.
ADA (Adversarial DomainAdaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
data feature
Xsrc
G(Xtgt)
C class
bad features
41.
ADA (Adversarial DomainAdaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
42.
ADA (Adversarial DomainAdaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
43.
ADA (Adversarial DomainAdaption)
• Goal: given labeled data in source domain, aim to classify unlabeled data in
target domain.
G
G(Xsrc)
Xtgt
adversarial
training
Make G(Xtgt) and G(Xsrc)
undistinguishable for D
Distinguish G(Xtgt) as target domain
from G(Xsrc) as source domain
data feature
Xsrc
G(Xtgt) D 1/0
C class
testing
(classification)
44.
On Unifying DeepGenerative Models
z y
x
latent (data)
data (feature)
label
45.
On Unifying DeepGenerative Models
z y
x
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
46.
On Unifying DeepGenerative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
z y
x
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
47.
On Unifying DeepGenerative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
• Solid line – generative process
• Dashed line – inference process
• Hollow arrow – deterministic transformation
• Red arrow – adversarial mechanism
• 𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙
z y
x
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
ADA
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
GAN
48.
On Unifying DeepGenerative Models
• 𝑮𝑮𝜽𝜽 – 𝜃𝜃 are parameters in generator
• 𝑫𝑫𝝓𝝓 – 𝜙𝜙 are parameters in generator
• Solid line – generative process
• Dashed line – inference process
• Hollow arrow – deterministic transformation
• Red arrow – adversarial mechanism
• 𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙 denotes 𝒒𝒒𝝓𝝓 𝒚𝒚 𝒙𝒙 and 𝒒𝒒𝝓𝝓 𝟏𝟏 − 𝒚𝒚 𝒙𝒙
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
GAN
𝒚𝒚 = �
𝟏𝟏, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝟎𝟎, 𝑖𝑖𝑖𝑖 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
ADA
latent (data)
data (feature)
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
(ADA)
On Unifying DeepGenerative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
56.
On Unifying DeepGenerative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
degenerated
code space
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
57.
On Unifying DeepGenerative Models
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒛𝒛
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
58.
On Unifying DeepGenerative Models
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
data
latent
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒙𝒙
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
59.
On Unifying DeepGenerative Models
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
data
latent
label
AAE
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
𝒑𝒑 𝒙𝒙
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
60.
GAN vs VAE
zy
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
latent
data
label
𝒑𝒑 𝒛𝒛
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
GAN
z y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚
𝒒𝒒∗
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
VAE
label
61.
InfoGAN vs VAE
zy
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒛𝒛, 𝒚𝒚
𝒒𝒒∗
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
VAE
labelz y
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
62.
InfoGAN vs AAE
zy
x 𝒑𝒑𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒙𝒙
latent
data
label
𝒒𝒒𝜼𝜼 𝒛𝒛 𝒙𝒙, 𝒚𝒚
InfoGAN
x y
z 𝒑𝒑𝜽𝜽 𝒛𝒛 𝒚𝒚
𝒒𝒒𝝓𝝓
𝒓𝒓
𝒚𝒚 𝒛𝒛
latent
label
𝒛𝒛 = 𝑮𝑮𝜽𝜽 𝒙𝒙
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒙𝒙 𝒚𝒚
𝒒𝒒𝜼𝜼 𝒙𝒙 𝒛𝒛, 𝒚𝒚
data
𝒑𝒑 𝑮𝑮𝜽𝜽 𝒛𝒛 𝒚𝒚
AAE
𝒙𝒙 = 𝑮𝑮𝜽𝜽 𝒛𝒛
63.
Wake-sleep Algorithm
• 𝒉𝒉- general latent variables
• 𝝀𝝀 - general parameters
• 𝜽𝜽 - generator parameters
• In wake phase, update 𝜽𝜽 by fitting 𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 to 𝒙𝒙 and 𝒉𝒉 inferred by 𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 .
• In sleep phase, update 𝝀𝝀 based on generated samples.
• VAE: 𝒉𝒉 → 𝒛𝒛, 𝝀𝝀 → 𝜼𝜼
• GAN: 𝒉𝒉 → 𝒚𝒚, 𝝀𝝀 → 𝝓𝝓
Wake: max
𝜽𝜽
𝔼𝔼𝒒𝒒𝝀𝝀 𝒉𝒉|𝒙𝒙 𝒑𝒑𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅 𝒙𝒙 log 𝑝𝑝𝜃𝜃 𝑥𝑥 ℎ
Sleep: max
𝜆𝜆
𝔼𝔼𝒑𝒑𝜽𝜽 𝒙𝒙|𝒉𝒉 𝒑𝒑 𝒉𝒉 log 𝑞𝑞𝜆𝜆 ℎ 𝑥𝑥
64.
References
• D. P.Kingma, M. Welling. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2014
• I. J. GoodFellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio.
Genrative Adversrial Nets. arXiv:preprint arXiv:1406.2661, 2014
• A. B. L. Larsen, S. K. Sønderby and O. Winther. Autoencoding beyond pixels using a learned
similarity metric. arXiv preprint arXiv:1512.09300, 2015
• A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow and B. Frey. Adversarial Autoencoders. arXiv preprint
arXiv:1511.05644, 2016
• Z. Hu, Z. Yang, R. Salakhutdinov and E. P. Xing. On unifying deep generative models. arXiv preprint
arXiv:1706.00550, 2017