InfoGAN and Generative Adversarial Networks

Outline
• GANs
– Basics
– Architecture
– Training algorithm
– Results
– Problems
• InfoGAN
– Disentangled representations
– Basics
– Implementation
– Results
• Potential fraud applications

What’s a Generative Adversarial
Network?
Let’s break down the terms…

Generative Adversarial Network
• Most machine learning is focused on learning
Discriminative Models:
– Find the decision boundary line (i.e. on the left of line = dog, right
of line = cat)
– Given new data, what label am I?
– Are conditional probability distributions: P(y|X). The probability
of a label given the data.
• Generative Models:
– Finds the distribution for the classes (i.e. probability of dog/cat for
all images)
– Give me a new image of a cat/dog or tell me probability of cat/dog
for this X
– Are joint probability distributions: P(X,y). The joint probability of
all possible data points for each possible label.

• Two player zero-sum game: the more I win, the more you
lose
• Minimax game:
– Each player choses action that minimizes their loss assuming they
will lose
– This is opposed to maximizing your gain assuming you will win
– AKA: Maximize the minimum gain
• GANs create minimax game between two models:
generator and discriminator

• Each model is a neural network
• Allows efficient update of parameters via
backpropagation algorithm

Cliché “Counterfeiter” Example
Generator
Model Generated (Fake) Data
Real (Training) Data
Discriminator
Real or Fake?

How to Build a GAN
Noise, z
Generator
Neural
Network
Fake Data
G(z)
Real Data
x
Discriminator
Neural
Network
Probability input is
real:
D(x) or D(G(z))

Training Deep Dive
• Optimization Function
• Training algorithm
• Convergence Guarantees

Optimization Function
• D(x) is probability x is real
• G wants this small, D wants this large (i.e. Adversarial)
• Use stochastic gradient descent and back-propagate
real fake

Convergence Guarantees
• Original paper proves that this algorithm theoretically
converges to pg = pdata—i.e. your generative model is a
perfect representation of your true data distribution
• In reality, we optimize parameters of a neural network
instead of the distributions themselves, so proof doesn’t
directly apply

GAN Results
Yellow boxes are real data samples that are nearest matches to last column of fake
images. This shows the generator didn’t merely memorize training examples

Results
* From DCGAN paper:
https://arxiv.org/abs/1511.06434
Meaningful
Representations!

Reflection
• It’s worth stressing that this is an entirely unsupervised
technique—you don’t use the hand-written digit label to
learn how to write
– Labeled training data is often hard to come by, but unlabeled data
is plentiful
• This is useful because you could train a GAN in an
unsupervised fashion using a lot of images, and then
build a classifier with labeled data using the internal
layers of the GAN discriminator as input.
– You will need much less labeled training data if useful
representations have already been learned with unlabeled data

Problems
• Hard to train
– “Mode collapse”: Generator gives the same thing over
and over
– Lack of convergence: Discriminator and Generator
never learn
• Tangled representations
– Not clear how to change the input to the generator to
get meaningful changes in the fake examples

Solutions!
• Hard to train
– DCGAN paper gives architectural guidelines that stabilize model
training
– Convolutional layers; batch normalization of weights; replace pooling
layers with strided convolutions to learn spatial up/down sampling;
remove fully connected layers; use ReLU/Leaky ReLU activation
functions
– Many other papers address this problem
• Entangled Representations
– InfoGAN

Entangled vs Disentangled
• To get a fake output, our only knob to turn is the generator input
noise
• What if you want a particular output, like the number 7 or a face
with sunglasses?
• It’s not clear how to modify the Generator input z to get the desired
results because the representation is entangled
Entangled Disentangled

InfoGAN
• Learns disentangled representations in an unsupervised
manner
• This is accomplished by splitting the generator input into
two parts: the noise, z and the latent code, c
• The codes are made meaningful by maximizing the
Mutual Information between the code and the generator
output
– See appendix for Mutual Info 101

Latent Code Intuition
• You know handwritten digits will be one of ten numbers, so why not
try to encode this structure?
• You could re-assign one of the z’s to be a 10-state discrete variable
• The hope is that all of the digit information is represented by this
one variable
• Maximizing the mutual information between this structured input
and the output will ensure it has meaning

How to Implement?
• Adds a regularizer to the GAN minimax game to maximize Mutual Information:
• Uses variational arguments to get lower bound using auxiliary distribution,
Q(c|x), which is a parameterized neural network to approximate P(c|x):
• Final form:
Prior = Easy!
Posterior = Hard
Constant, not important

InfoGAN Architecture
• Q(c|x) tries to recover the code from the input. In reality, it’s just a fully
connected layer attached to the end of the discriminator D.
Noise, z Generator
Neural
Network
Fake Data
G(z, c)
Discriminator
Neural
Network
Estimation of c
Code, c
Q Neural
Network
Real Data
x
Probability input is
real:
D(x) or D(G(z))

Q Updates
• Pick a c; Pick a z; Calculate G(c,z); Calculate Q(c|x=G(c,z));
– If using a discrete code, Q network outputs softmax (i.e. prob(0) = 0.1, prob(1) =
0.05, prob(2) = 0.8, …, prob(9) = .01)
– If using a continuous code, Q network outputs sufficient statistics, like mean and
standard deviation of a normal distribution—your choice how to model it.
• Once you know the probability (or parameters of the distribution from which
you can calculate the probability), you can calculate log-likelihood:
log Q(c|X)

Appendix: Mutual Information 101
• I(X;Y) = H(X) – H(X|Y)
– How much is the uncertainty in X reduced if you know Y? AKA: How much
information about X is in Y?
– If X and Y are independent, H(X|Y) = H(X) => I(X;Y) = 0.
• Example: whether its raining outside and if it’s dark when you wake up
– If it rains 28% of time and does not rain 72% of time:
– H(0.28, 0.72) = 0.86 bits
– Let p(x,y) =
– H(X|Y) = 0.37 bits
– Mutual Information = 0.48 bits
Dark
Rain No Yes
No 0.70 0.02
Yes 0.08 0.20

InfoGAN and Generative Adversarial Networks

More Related Content

What's hot

Similar to InfoGAN and Generative Adversarial Networks

Recently uploaded

InfoGAN and Generative Adversarial Networks