Practical Deep Learning with
Tensorflow
Barbara Fusinska
@BasiaFusinska
About me
Data Science Freelancer
Machine Learning
Programmer
@BasiaFusinska
Barbara@Fusinska.com
BarbaraFusinska.com
https://katacoda.com/basiafusinska/courses/deep-learning-with-tensorflow
Agenda
• Introduction to Machine Learning
• Main concepts of Deep Learning
• TensorFlow Basics
• Building Neural Networks with TensorFlow
• MNIST Classification
• Convolutional Networks
• TensorFlow abstraction levels
Machine Learning?
Movies Genres
Title # Kisses # Kicks Genre
Taken 3 47 Action
Love story 24 2 Romance
P.S. I love you 17 3 Romance
Rush hours 5 51 Action
Bad boys 7 42 Action
Question:
What is the genre of
Gone with the wind
?
Data-based classification
Id Feature 1 Feature 2 Class
1. 3 47 A
2. 24 2 B
3. 17 3 B
4. 5 51 A
5. 7 42 A
Question:
What is the class of the entry
with the following features:
F1: 31, F2: 4
?
Data Visualization
0
10
20
30
40
50
60
0 10 20 30 40 50
Rule 1:
If on the left side of the
line then Class = A
Rule 2:
If on the right side of the
line then Class = B
A
B
Chick sexing
Machine Learning Problems
• Classification
• Regression
• Clustering
• Anomaly detection
• Recommendation systems
• …
Supervised
learning
• Classification, regression
• Label, target value
• Training & Validation
phases
Unsupervised
learning
• Clustering, feature
selection
• Finding structure of data
• Statistical values
describing the data
Supervised Machine Learning workflow
Clean data Data split
Machine Learning
algorithm
Trained model Evaluation
Preprocess
data
Training
data
Test data
Classification problem
Model training
Data & Labels
Classification data
Source #Links #Characters ... Fake
TopNews 10 2750 … T
Twitter 2 120 … F
TopNews 235 502 … F
Channel X 1530 3024 … T
Twitter 24 70 … F
StoryLeaks 722 1408 … T
Facebook 98 230 … T
… … … … ...
Features
Labels
Decision trees
• Use the information gain and
entropy
• Finding the feature that best
splits the dataset
• Build the tree
• Prune the tree
K-Nearest Neighbours Algorithm
• Object is classified by a majority
vote
• k – algorithm parameter
• Distance metrics: Euclidean
(continuous variables), Hamming
(text)
?
Logistic regression
𝑧 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽 𝑘 𝑥 𝑘
𝑦 =
1 𝑓𝑜𝑟 𝑧 > 0
0 𝑓𝑜𝑟 𝑧 < 0
𝑦 =
1 𝑓𝑜𝑟 𝜙(𝑧) > 0.5
0 𝑓𝑜𝑟 𝜙(𝑧) < 0.5
Logistic function (Sigmoid)
Coefficients
Best fit of β
Classification
task
Demo
Task: Logistic
Regression
• Load the dataset
• Split for the train and test set
• Train the algorithm, use Logistic
Regression
• Evaluate the algorithm
What is Deep Learning?
Neural
Networks
• Interconnected units (neurons)
• Activation signal(s)
• Information processing
• Learning involves adjustments
to the synaptic connections
Brief History of Neural Network
What has
changed?
• Big Data
• The Cloud
• GPU
• Training methods
• Tools
• Computer Vision, NLP
Artificial Neural Networks building blocks
Fully connected layers
Forward propagation
𝑋 = 𝑥1 … 𝑥 𝑛 - current input
𝑊𝑗 = 𝑤1𝑗 … 𝑤 𝑛𝑗 - j-th neuron weights
𝑊 =
𝑤11 … 𝑤 𝑛1
⋮ ⋱ ⋮
𝑤1𝑘 … 𝑤 𝑛𝑘
- weights for every neuron in the layer
𝑏 = 𝑏1 … 𝑏 𝑘 - biases for the layer
𝑖=1
𝑛
𝑥𝑖 ∗ 𝑤𝑖𝑗 + 𝑏𝑗 = 𝑋 ∙ 𝑊𝑗
𝑇
+ 𝑏𝑗 - computation in the j-th neuron
𝑖=1
𝑛
𝑥𝑖 ∗ 𝑤𝑖1 + 𝑏1 …
𝑖=1
𝑛
𝑥𝑖 ∗ 𝑤𝑖𝑘 + 𝑏 𝑘
= 𝑋 ∙ 𝑊1
𝑇
+ 𝑏1 … 𝑋 ∙ 𝑊𝑘
𝑇
+ 𝑏 𝑘 = 𝑋 ∙ 𝑊 𝑇
+ 𝑏
Classifcation output
• Binary classification
𝑜𝑢𝑡 𝜖 (−∞, +∞)
𝑦 =
1 𝑓𝑜𝑟 𝜙(𝑜𝑢𝑡) > 0.5
0 𝑓𝑜𝑟 𝜙(𝑜𝑢𝑡) < 0.5
Numpy basics
Demo
Task: Forward
Propagation
• Read data for both linear and
nonlinear examples
• Write the forward propagation
function
• Initialise weights and biases
• Perform forward propagation on
both datasets
Hidden layers
The neurons layer output
𝑏(𝑙) = 𝑏1 … 𝑏𝑙 𝑘 - biases vector for the layer l
𝑜(𝑙−1)- output of the previous layer
𝑍(𝑙) = 𝑜(𝑙−1) ∙ 𝑊(𝑙)
𝑇
+ 𝑏(𝑙) - product of using weights and biases on the input
𝑜(𝑙) = 𝜑(𝑙)(𝑍(𝑙)) – applied activation function
𝑊(𝑙) =
𝑤11 … 𝑤𝑙−1 𝑘1
⋮ ⋱ ⋮
𝑤1𝑙 𝑘
… 𝑤𝑙−1 𝑘 𝑙 𝑘
Activation
function
• Non-linear problems
• Deep networks
• Vanishing gradient problem
Task: Hidden
layers
• Use the non-linear dataset
• Forward propagation function for
hidden layer (use tanh)
• Prepare the weights abiases for
both layers
• Set up forward propagation for the
whole network
Optimisation problem
• Loss function: 𝐽(𝜃)
• Minimising/Maximising
• Local extremes
• Finding the value or the function
or the arguments
• ML problems – usually
converted to the optimisation
problems
Gradient Descent
• Climbing the hill
• Iterative process
• Learning rate
• Tuning techniques
• Initialisation values matter
𝑍(𝑚) = 𝑍(𝑚−1) − 𝛼
𝜕𝐿
𝜕𝑍
Loss function for the classification process
𝑍(𝐿) = 𝑜(𝐿−1) ∙ 𝑊 𝐿
𝑇
+ 𝑏 𝐿
𝑜(𝐿) = 𝜑(𝐿)(𝑍(𝐿))
Cross entropy:
𝐽 = −
1
𝑚
(𝑌 ∗ log 𝑜 𝐿 + (1 − Y) ∗ log(1 − 𝑜(𝐿)))
Backpropagation
Hidden + Output backpropagation
𝑑𝑍(2) = 𝑜(2) − 𝑌
𝑑𝑊(2) =
1
𝑚
∗ (𝑑𝑍 2
𝑇
∙ 𝑜 1 )
𝑑𝑏(2) =
1
𝑚
∗ 𝑑𝑍 2 𝑖
𝑑𝑜(1) = (𝑑𝑍(2)∙ 𝑊 2 ) ∗ (1 − 𝑜 1
2
)
𝑑𝑊(1) =
1
𝑚
∗ (𝑑𝑜 1
𝑇
∙ 𝑋)
𝑑𝑏(1) =
1
𝑚
∙ 𝑑𝑜 1 𝑖
Neural
Network
Training
Demo
Introduction to TensorFlow
• Computational Graph
• Loss Function Optimisation
• Neural Network Architectures
• Deep Learning components
• Machine Learning APIs
• …
Computational Graph
Computational
Graph
Demo
Task: Optimising
the function
• Set up the computation graph for
the quadratic function
• Define the Optimiser (use Gradient
Descent)
• Initialise the session and run the
optimisation
• Print the results and close the
session
𝑦 = 𝑥2 − 10𝑥 + 24
Neural Network training in TensorFlow
Neural Network Architecture
Input
(Placeholder)
W,b(Variables)
Loss
function
Optimiser
TensorFlow
Network
Training
Demo
Task: TensorFlow
Deep Network
Training
• Set up placeholders for the input
data and the labels
• Define hidden layer, connect it
with the input placeholders
• Define the output layer, connect it
with the hidden one
• Set up the feed_dict with the
actual data
Classification task
Network training
Data & Labels
0
1
2
3
4
5
6
7
8
9http://yann.lecun.com/exdb/mnist/
Data preparation
28 x 28
784
MNIST
Dataset
Demo
Working with batches
• Accuracy vs. time
• Batches size becomes the
hyperparameter
• Stochastic Gradient Descent
• True gradient is approximated by a
gradient at a single example
Training process
Batches
Dataset
Parameters adjusting
Input/Output layers
784
Input data
...
Input layer
2
9
0
1
...
Output layer
𝑜 = 𝜑(𝑾 ∙ 𝑥 + 𝒃)
arg max
9
Label
MNIST Simple
Neural
Network
Training
Demo
Hidden layer
...
Hidden layer
2
9
0
1
...
Output layer
...
Input layer
ℎ = 𝑡𝑎𝑛ℎ(𝑾 ∙ 𝑥 + 𝒃)
Task: MNIST
Dataset Deep
Learning
• Load MNIST dataset
• Define placeholders for the input
data and labels
• Set up hidden layer (weights, biases
and connect with input data)
• Set up output layer and connect it
with the hidden one
• Define Adam Optmizer
• Set up retrieving the batches and the
feed_dict values for the training
• Set up feed_dict for the training
examples in the evaluation phase
Weights initialisation issue
• Weights range:
• Too small causes signal shrink
• Too big amplifies the signa until it’s to massive
• Symmetry problem
• The same hidden nodes values
• Zero output
Weights initialisation methods
• Constant:
• Zero – critical point, error signal will
not propagate, gradient will be zero
(no progress)
• Symmetry
• Random, [-1, +1 ], [0, 1]
• Use small random values
• E.g. Gaussian 𝜇 = 0, constant 𝜎
• Xavier-Glorot
• Keeping the weights ‘just right’
keeping the signal in the proper range
through many layers
MNIST Neural
Network
Solution
Demo
Convolutional
Neural Network
• Inputs have higher
dimensions
• Reduce the number of
parameters (W, b)
• Neurons arranged in 3D
• Connected to the region
of the previous layer
Pooling
• Reduce spatial size
• Control overfitting
• Applied to every depth slice
• Window size, stride
• Max, Average
Two convolutional layers
...
Dense layer
2
9
0
1
...
Output layer
...
Flattened convolutionInput layer 1st Convolutional 2nd Convolutional
Task: MNIST
Dataset
Convolution
• Reshape the data to 2D images
• Initialise the variables for the first
convolutional layer
• Define the convolutional and the
max pooling layer
• Define the second convolutional
layer
• Faltten the output
Dropout
• Overfitting
• Probability of the unit being
“dropped out”
• Dense layers
• Only training time
Dropout phase
...
Dense layer
2
9
0
1
...
Output layer
...
Flattened convolutionInput layer 1st Convolutional 2nd Convolutional
Dropout
Layers with
tf.layers
package
Demo
Keep in touch
BarbaraFusinska.com
Barbara@Fusinska.com
@BasiaFusinska
https://katacoda.com/basiafusinska

Deep learning with TensorFlow