Deep Generative
Modelling
Generative and Discriminative models
Autoencoders
Variational Autoencoders
Generative Adversarial Networks
Conditional Generative Models
Agenda
Generative Models
❏ Generates new random observable data, models the joint
distribution of all variables.
❏ Given some dataset D generate new samples like D, but not
the same.
❏ We need to adjust their hidden parameters
❏ Considered as branch of unsupervised learning, but they can
be used for tasks like classification
“What I cannot create, I do
not understand.”
—Richard Feynman
Motivation
❏ Tremendous amount of information out there in
the world
❏ Machines are good at solving specific tasks
❏ Better than humans in Object recognition, Speech
recognition, Tumour segmentation, Go
❏ Cannot build compact representations of the world :(
Intelligence Gap
Help ML models to learn very compact
and disentangled representations.
Disentangled factors
❏ P( X | Z), where X is an image, Z is a vector that
causes (explains) X
❏ We would like the dimensions of Z to describe
real world factors
❏ Z which has a separate dimension for lighting,
guitar, bookshelf , rotation will be considered
more disentangled than the raw pixels of X
❏ P(guitar | Z) can be easily computed with
Disentangled representation.
Applications
❏ Short term applications
❏ Image translation, denoising, super-resolution
❏ Domain Adaptation, Synthetic data generation
❏ Music, Audio and Text Generation
❏ Long term applications
❏ Understanding of the real world
❏ Artificial General Intelligence
Discriminative models
❏ ImageNet. Here y would be the vector of 1000
labels and x some image from the dataset.
❏ They are trying to maximize log P(y | x)
❏ Predictions obtained by argmax of yi : P(yi | x)
❏ Classification models are mostly discriminative
ones.
Generative Models
❏ During training maximize the probability log P(X)
❏ Generate new sampled images close to the real
distribution P(X*
)
❏ During inference for some image X depending on
the model you might be able to estimate the
probability of the image X under the model
Properties and Drawbacks of Discriminative Models
❏ Good at capturing statistical regularities of the data
❏ Find features invariant to characteristics you don’t care for the
task
❏ Object classification: Rotation, Translation, Lighting, Color
❏ Segmentation: You care for Rotation, Translation
❏ Having difficulties to build disentangled representations
❏ Adversarial examples are good example for that
Generation from Discriminative Model (Example)
Handwriting Model This is regarding my friend, Kate Zack
Gradient ascent on the input image X
Generation from Discriminative Model (Example)
Handwriting Model
P E T K O
X
Maximize
Generative Models
❏ Gaussian mixture model
❏ Hidden Markov model
❏ Naive Bayes
❏ Latent Dirichlet allocation
❏ … many others
Deep Generative Models
❏ Restricted Boltzmann Machines
❏ Variational Autoencoders
❏ PixelRNN, PixelCNN
❏ Generative Adversarial Networks
❏ Neural Language Models
❏ WaveNet
Deep Generative Models
❏ Restricted Boltzmann Machines
❏ Variational Autoencoders
❏ PixelRNN, PixelCNN
❏ Generative Adversarial Networks
❏ Neural Language Models
❏ WaveNet
Deep Generative Model
Generator
Latent variables (code)
Autoencoder
Autoencoder
network
Loss = Pixelwise L2 or Softmax.
Autoencoders ● Latent variables
● Lower dimensional than the input
Encoder Decoder
Loss =
Autoencoders
❏ random latent code won’t get us anywhere
❏ Pass an image to the encoder to get “valid” code
Encoder Decoder
Variational Autoencoders
❏ Encoder-Decoder architecture
❏ Forcing the latent code to be Gaussian distributed
❏ Sample the latent code from the Gaussian and pass
it to the decoder network
Variational Autoencoders
Encoder Decoder
Mean
Sampled code
Variational Autoencoders - Samples
Input Output
❏ CIFAR-10
❏ Blurry images
❏ Good
approximation
of the
likelihood of
the input data
Deep Recurrent Attentive Writer
❏ Generates the image sequentially
❏ On each step the model decides where to focus and
draw
❏ Uses an attention mechanism to achieve it
Deep Recurrent Attentive Writer (DRAW)
❏ Google Street View Numbers
❏ The red rectangle is showing
where the model is attending on
the current step
❏ Impressive as DRAW is the first
successful model that generates
images sequentially
Deep Recurrent Attentive Writer
VAEs DRAW
Deep Recurrent Attentive Writer
Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
Deep Recurrent Attention Writer
Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
Deep Recurrent Attentive Writer
Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
Fully Convolutional Model
❏ Typically using pre-trained classification network as
encoder
❏ Most often VGG-16, because it’s fast and has less
parameters
❏ Using transposed convolution layers as decoder
until we reach the desired shape
❏ Often the architecture of the encoder is the
transposed of the one of the decoder
Fully Convolutional Decoder
Transposed Convolution (Deconvolution)
Parameters
❏ Kernel size = 3
❏ Stride = 1
Input layer
Output layer
Properties of Transposed Convolution
❏ During backpropagation a convolutional layer
becomes transposed convolution
❏ Checkerboard pattern might appear in the
generated image (sensitive to kernel and stride
sizes)
Odena, et al., "Deconvolution and Checkerboard
Artifacts", Distill, 2016. http://doi.org/10.23915
Pros of VAEs
❏ In practice, VAEs latent code dimensions are very
interpretable
❏ To achieve this it collapses some latent dimensions
and doesn’t use them
❏ Able to generate samples close to the data distribution
Drawbacks of VAEs
❏ Pixels in the L2 loss function are independent, which
leads to blurry images
❏ The exact probability of a generated image under the
model is intractable to compute
X - input image
Z - latent code
Generative Adversarial Networks
❏ A generative model invented by Ian Goodfellow in
2014
❏ Already widely adopted and an area of massive
research
❏ New GAN paper is published every week
❏ Has many awesome applications. We’ll see some of
them later on.
❏ GANs define the generative problem as an
adversarial game between two networks
Generative Adversarial Networks (GANs)
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Loss
Discriminator Training of GANs
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Classification Loss
Generator Training of GANs
Generator
Discriminator
Sample
Real
Images
Sample
Real Fake
Maximize
GANs Training is challenging
❏ Unstable during training
❏ Mode collapse
❏ Higher Log-likelihood != better samples
However, GAN training is getting easier. Checkout
Wassterstein GANs and LSGANs .
DCGAN
GAN Samples
Generative Adversarial Networks (Goodfellow et al., 2014)
GAN Samples
Generative Adversarial Networks (Goodfellow et al., 2014)
Progressive Growing of GANs
Progressive Growing of GANs for Improved Quality, Stability, and Variation (Karras et al., 2018)
Conditional Generative Adversarial Networks
Real
Data
❏ Consists of pairs (X,Y)
❏ P(Y|X) : generate Y given X
pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
Image to Image Translation (pix2pix)
pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
Image to Image Translation (Example)
pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
❏ X and Y are unpaired collections of images
❏ different domains of the same world
❏ Learn to translate image X into Y
CycleGAN
X Y
CycleGAN
Zhu et al 2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
CycleGAN (failure cases)
Zhu et al 2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
CycleGAN
Zhu et al 2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
❏ Cycle Consistency Loss
❏ ||F(G(X)) - X||
❏ ||G(F(Y)) - Y||
CycleGAN
Zhu et al 2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
Generating Images from Text
Generative Adversarial Text to Image Synthesis (Reed et al. 2016)
Generating Images from Text
Generative Adversarial Text to Image Synthesis (Reed et al. 2016)
References
❏ https://blog.openai.com/generative-models
❏ http://distill.pub/2016/deconv-checkerboard/
❏ https://github.com/junyanz/CycleGAN
❏ https://github.com/phillipi/pix2pix
❏ http://videolectures.net/deeplearning2015_bengio_
generative_models/
❏ https://www.youtube.com/watch?v=P78QYjWh5sM
&spfreload=1
❏ http://image-net.org/explore
❏ http://kvfrans.com/what-is-draw-deep-recurrent-att
entive-writer/
56
Q&A
HyperScience wants You for...
• Machine Learning Engineer
• For more info: goo.gl/wqPoqU
• Or https://www.hyperscience.com/careers
57
So the DL course is over… What’s next?
We have one word for you - MEETUPS!
❏ Various engineering topics
❏ Different cool locations
❏ Smart and interesting speakers from HyperScience and
fellow companies
Coming up in the fall...
Follow our Facebook page for more details and updates.
Let’s celebrate the success of this course
together
❏ We’d love to demo to you our technologies and
products - we’ll be doing two different demo
sessions simultaneously in this same hall;
❏ We are here to be asked all sort of questions -
happy to answer them all;
❏ And last but not least - the bar is open :)

Deep Generative Modelling (updated)

  • 1.
  • 2.
    Generative and Discriminativemodels Autoencoders Variational Autoencoders Generative Adversarial Networks Conditional Generative Models Agenda
  • 3.
    Generative Models ❏ Generatesnew random observable data, models the joint distribution of all variables. ❏ Given some dataset D generate new samples like D, but not the same. ❏ We need to adjust their hidden parameters ❏ Considered as branch of unsupervised learning, but they can be used for tasks like classification
  • 4.
    “What I cannotcreate, I do not understand.” —Richard Feynman
  • 5.
    Motivation ❏ Tremendous amountof information out there in the world ❏ Machines are good at solving specific tasks ❏ Better than humans in Object recognition, Speech recognition, Tumour segmentation, Go ❏ Cannot build compact representations of the world :(
  • 6.
    Intelligence Gap Help MLmodels to learn very compact and disentangled representations.
  • 7.
    Disentangled factors ❏ P(X | Z), where X is an image, Z is a vector that causes (explains) X ❏ We would like the dimensions of Z to describe real world factors ❏ Z which has a separate dimension for lighting, guitar, bookshelf , rotation will be considered more disentangled than the raw pixels of X ❏ P(guitar | Z) can be easily computed with Disentangled representation.
  • 8.
    Applications ❏ Short termapplications ❏ Image translation, denoising, super-resolution ❏ Domain Adaptation, Synthetic data generation ❏ Music, Audio and Text Generation ❏ Long term applications ❏ Understanding of the real world ❏ Artificial General Intelligence
  • 9.
    Discriminative models ❏ ImageNet.Here y would be the vector of 1000 labels and x some image from the dataset. ❏ They are trying to maximize log P(y | x) ❏ Predictions obtained by argmax of yi : P(yi | x) ❏ Classification models are mostly discriminative ones.
  • 10.
    Generative Models ❏ Duringtraining maximize the probability log P(X) ❏ Generate new sampled images close to the real distribution P(X* ) ❏ During inference for some image X depending on the model you might be able to estimate the probability of the image X under the model
  • 11.
    Properties and Drawbacksof Discriminative Models ❏ Good at capturing statistical regularities of the data ❏ Find features invariant to characteristics you don’t care for the task ❏ Object classification: Rotation, Translation, Lighting, Color ❏ Segmentation: You care for Rotation, Translation ❏ Having difficulties to build disentangled representations ❏ Adversarial examples are good example for that
  • 12.
    Generation from DiscriminativeModel (Example) Handwriting Model This is regarding my friend, Kate Zack Gradient ascent on the input image X
  • 13.
    Generation from DiscriminativeModel (Example) Handwriting Model P E T K O X Maximize
  • 14.
    Generative Models ❏ Gaussianmixture model ❏ Hidden Markov model ❏ Naive Bayes ❏ Latent Dirichlet allocation ❏ … many others
  • 15.
    Deep Generative Models ❏Restricted Boltzmann Machines ❏ Variational Autoencoders ❏ PixelRNN, PixelCNN ❏ Generative Adversarial Networks ❏ Neural Language Models ❏ WaveNet
  • 16.
    Deep Generative Models ❏Restricted Boltzmann Machines ❏ Variational Autoencoders ❏ PixelRNN, PixelCNN ❏ Generative Adversarial Networks ❏ Neural Language Models ❏ WaveNet
  • 17.
  • 18.
  • 19.
    Autoencoders ● Latentvariables ● Lower dimensional than the input Encoder Decoder Loss =
  • 20.
    Autoencoders ❏ random latentcode won’t get us anywhere ❏ Pass an image to the encoder to get “valid” code Encoder Decoder
  • 21.
    Variational Autoencoders ❏ Encoder-Decoderarchitecture ❏ Forcing the latent code to be Gaussian distributed ❏ Sample the latent code from the Gaussian and pass it to the decoder network
  • 22.
  • 23.
    Variational Autoencoders -Samples Input Output ❏ CIFAR-10 ❏ Blurry images ❏ Good approximation of the likelihood of the input data
  • 24.
    Deep Recurrent AttentiveWriter ❏ Generates the image sequentially ❏ On each step the model decides where to focus and draw ❏ Uses an attention mechanism to achieve it
  • 25.
    Deep Recurrent AttentiveWriter (DRAW) ❏ Google Street View Numbers ❏ The red rectangle is showing where the model is attending on the current step ❏ Impressive as DRAW is the first successful model that generates images sequentially
  • 26.
    Deep Recurrent AttentiveWriter VAEs DRAW
  • 27.
    Deep Recurrent AttentiveWriter Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
  • 28.
    Deep Recurrent AttentionWriter Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
  • 29.
    Deep Recurrent AttentiveWriter Image source:http://kvfrans.com/what-is-draw-deep-recurrent-attentive-writer/
  • 30.
    Fully Convolutional Model ❏Typically using pre-trained classification network as encoder ❏ Most often VGG-16, because it’s fast and has less parameters ❏ Using transposed convolution layers as decoder until we reach the desired shape ❏ Often the architecture of the encoder is the transposed of the one of the decoder
  • 31.
  • 32.
    Transposed Convolution (Deconvolution) Parameters ❏Kernel size = 3 ❏ Stride = 1 Input layer Output layer
  • 33.
    Properties of TransposedConvolution ❏ During backpropagation a convolutional layer becomes transposed convolution ❏ Checkerboard pattern might appear in the generated image (sensitive to kernel and stride sizes) Odena, et al., "Deconvolution and Checkerboard Artifacts", Distill, 2016. http://doi.org/10.23915
  • 34.
    Pros of VAEs ❏In practice, VAEs latent code dimensions are very interpretable ❏ To achieve this it collapses some latent dimensions and doesn’t use them ❏ Able to generate samples close to the data distribution
  • 35.
    Drawbacks of VAEs ❏Pixels in the L2 loss function are independent, which leads to blurry images ❏ The exact probability of a generated image under the model is intractable to compute X - input image Z - latent code
  • 36.
    Generative Adversarial Networks ❏A generative model invented by Ian Goodfellow in 2014 ❏ Already widely adopted and an area of massive research ❏ New GAN paper is published every week ❏ Has many awesome applications. We’ll see some of them later on. ❏ GANs define the generative problem as an adversarial game between two networks
  • 37.
    Generative Adversarial Networks(GANs) Generator Discriminator Sample Real Images Sample Real Fake Loss
  • 38.
    Discriminator Training ofGANs Generator Discriminator Sample Real Images Sample Real Fake Classification Loss
  • 39.
    Generator Training ofGANs Generator Discriminator Sample Real Images Sample Real Fake Maximize
  • 40.
    GANs Training ischallenging ❏ Unstable during training ❏ Mode collapse ❏ Higher Log-likelihood != better samples However, GAN training is getting easier. Checkout Wassterstein GANs and LSGANs .
  • 41.
  • 42.
    GAN Samples Generative AdversarialNetworks (Goodfellow et al., 2014)
  • 43.
    GAN Samples Generative AdversarialNetworks (Goodfellow et al., 2014)
  • 44.
    Progressive Growing ofGANs Progressive Growing of GANs for Improved Quality, Stability, and Variation (Karras et al., 2018)
  • 45.
    Conditional Generative AdversarialNetworks Real Data ❏ Consists of pairs (X,Y) ❏ P(Y|X) : generate Y given X pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
  • 46.
    Image to ImageTranslation (pix2pix) pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
  • 47.
    Image to ImageTranslation (Example) pix2pix:Image-to-Image Translation with CANs, Isola et al 2016
  • 48.
    ❏ X andY are unpaired collections of images ❏ different domains of the same world ❏ Learn to translate image X into Y CycleGAN X Y
  • 49.
    CycleGAN Zhu et al2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
  • 50.
    CycleGAN (failure cases) Zhuet al 2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
  • 51.
    CycleGAN Zhu et al2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs) ❏ Cycle Consistency Loss ❏ ||F(G(X)) - X|| ❏ ||G(F(Y)) - Y||
  • 52.
    CycleGAN Zhu et al2017 (Unpaired Image-to-Image Translation using Cycle-Consistent GANs)
  • 53.
    Generating Images fromText Generative Adversarial Text to Image Synthesis (Reed et al. 2016)
  • 54.
    Generating Images fromText Generative Adversarial Text to Image Synthesis (Reed et al. 2016)
  • 55.
    References ❏ https://blog.openai.com/generative-models ❏ http://distill.pub/2016/deconv-checkerboard/ ❏https://github.com/junyanz/CycleGAN ❏ https://github.com/phillipi/pix2pix ❏ http://videolectures.net/deeplearning2015_bengio_ generative_models/ ❏ https://www.youtube.com/watch?v=P78QYjWh5sM &spfreload=1 ❏ http://image-net.org/explore ❏ http://kvfrans.com/what-is-draw-deep-recurrent-att entive-writer/
  • 56.
  • 57.
    HyperScience wants Youfor... • Machine Learning Engineer • For more info: goo.gl/wqPoqU • Or https://www.hyperscience.com/careers 57
  • 58.
    So the DLcourse is over… What’s next? We have one word for you - MEETUPS! ❏ Various engineering topics ❏ Different cool locations ❏ Smart and interesting speakers from HyperScience and fellow companies Coming up in the fall... Follow our Facebook page for more details and updates.
  • 59.
    Let’s celebrate thesuccess of this course together ❏ We’d love to demo to you our technologies and products - we’ll be doing two different demo sessions simultaneously in this same hall; ❏ We are here to be asked all sort of questions - happy to answer them all; ❏ And last but not least - the bar is open :)