Deep Learning and Angular
Angular Meetup (06/14/2017)
Google (Mountain View)
Oswald Campesato
ocampesato@yahoo.com
The Data/AI Landscape
Gartner Hype Curve: Where is Deep Learning?
The Impact of AI
“Robot trucks will kill far fewer people (if any).
Machines don’t get distracted or look at phones
instead of the road.
Machines don’t drink alcohol, do drugs, or things that
contribute to accidents.”
Robot trucks don’t need salaries, vacations, health
insurance, rest periods, or sick time.
The only costs will be upkeep of the machinery.
AI/ML/DL: How They Differ
Traditional AI (20th century):
based on collections of rules
Led to expert systems in the 1980s
The era of LISP and Prolog
AI/ML/DL: How They Differ
Machine Learning:
Started in the 1950s (approximate)
Alan Turing and “learning machines”
Data-driven (not rule-based)
Many types of algorithms
Involves optimization
AI/ML/DL: How They Differ
Deep Learning:
Started in the 1950s (approximate)
The “perceptron” (basis of NNs)
Data-driven (not rule-based)
large (even massive) data sets
Involves neural networks (CNNs: ~1970s)
Lots of heuristics
Heavily based on empirical results
The Rise of Deep Learning
Massive and inexpensive computing power
Huge volumes of data/Powerful algorithms
The “big bang” in 2009:
”deep-learning neural networks and NVidia GPUs"
Google Brain used NVidia GPUs (2009)
AI/ML/DL: Commonality
All of them involve a model
A model represents a system
Goal: a good predictive model
The model is based on:
Many rules (for AI)
data and algorithms (for ML)
large sets of data (for DL)
A Basic Model in Machine Learning
Let’s perform the following steps:
1) Start with a simple model (2 variables)
2) Generalize that model (n variables)
3) See how it might apply to a NN
Linear Regression
One of the simplest models in ML
Fits a line (y = m*x + b) to data in 2D
Finds best line by minimizing MSE:
m = average of x values (“mean”)
b also has a closed form solution
Linear Regression in 2D: example
Linear Regression: alternatives
Fitting a polynomial (degree 2, 3, …)
Can lead to overfitting
Polynomials diverge faster than lines
Can reduce predictive accuracy
NB: Linear Regression != Curve Fitting
Linear Regression: example #1
One feature (independent variable):
X = number of square feet
Predicted value (dependent variable):
Y = cost of a house
A very “coarse grained” model
We can devise a much better model
Linear Regression: example #2
Multiple features:
X1 = # of square feet
X2 = # of bedrooms
X3 = # of bathrooms (dependency?)
X4 = age of house
X5 = cost of nearby houses
X6 = corner lot (or not): Boolean
a much better model (6 features)
Linear Multivariate Analysis
General form of multivariate equation:
Y = w1*x1 + w2*x2 + . . . + wn*xn + b
w1, w2, . . . , wn are numeric values
x1, x2, . . . , xn are variables (features)
Properties of variables:
Can be independent (Naïve Bayes)
weak/strong dependencies can exist
Neural Network with 3 Hidden Layers
Neural Networks: equations
Node “values” in first hidden layer:
N1 = w11*x1+w21*x2+…+wn1*xn
N2 = w12*x1+w22*x2+…+wn2*xn
N3 = w13*x1+w23*x2+…+wn3*xn
. . .
Nn = w1n*x1+w2n*x2+…+wnn*xn
Similar equations for other pairs of layers
Neural Networks: Matrices
From inputs to first hidden layer:
Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix)
From first to second hidden layers:
Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix)
From second to third hidden layers:
Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)
 Apply an “activation function” to y values
Neural Networks (general)
Multiple hidden layers:
Layer composition is your decision
Activation functions: sigmoid, tanh, RELU
https://en.wikipedia.org/wiki/Activation_function
Back propagation (1980s)
https://en.wikipedia.org/wiki/Backpropagation
=> Initial weights: small random numbers
Activation Functions (Examples)
import numpy as np
...
# Python sigmoid example:
z = 1/(1 + np.exp(-np.dot(W, x)))
...
# Python tanh example:
z = np.tanh(np.dot(W,x));
# Python ReLU example:
z = np.maximum(0, np.dot(W, x))
What’s the “Best” Activation Function?
Initially sigmoid was popular
then tanh became popular
Now RELU is preferred (better results)
NB: sigmoid + tanh are used in LSTMs
Sample Cost Function #1
Sample Cost Function #2
How to Select a Cost Function
1) Depends on the learning type:
=> supervised/unsupervised/RL
2) Depends on the activation function
3) Other factors
Example:
cross-entropy cost function for supervised
learning on multiclass classification
GD versus SGD
SGD (Stochastic Gradient Descent):
+ involves a SUBSET of the dataset
+ aka Minibatch Stochastic Gradient Descent
GD (Gradient Descent):
+ involves the ENTIRE dataset
More details:
http://cs229.stanford.edu/notes/cs229-notes1.pdf
What are Hyper Parameters?
higher level concepts about the model such as
complexity, or capacity to learn
Cannot be learned directly from the data in the
standard model training process
must be predefined
Hyper Parameters (examples)
# of hidden layers in a neural network
the learning rate (in many models)
# of leaves or depth of a tree
# of latent factors in a matrix factorization
# of clusters in a k-means clustering
How Many Layers in a DNN?
Algorithm #1 (from Geoffrey Hinton):
1) add layers until you start overfitting your
training set
2) now add dropout or some another
regularization method
Algorithm #2 (Yoshua Bengio):
"Add layers until the test error does not improve
anymore.”
How Many Hidden Nodes in a DNN?
Based on a relationship between:
# of input and # of output nodes
Amount of training data available
Complexity of the cost function
The training algorithm
Use Cases for Neural Networks
CNNs (Convolutional NNs):
Good for image processing
2000: CNNs processed 10-20% of all checks
=> Approximately 60% of all NNs
RNNs (Recurrent NNs):
Good for NLP and audio
CNN: Sample Filters
CNN Filters (examples)
Types of RNNs
LSTMs (Long Short Term Memory)
GRUs
ResNets (Residual NNs)
Features of LSTMs
Used in Google speech recognition + Alpha Go
input/output/forget gates
they avoid the vanishing gradient problem
Can track 1000s of discrete time steps
Used by international competition winners
Often combined with CTC
Inside an LSTM
Inside an LSTM
Inside an LSTM
Keras/LSTM Code Snippet
import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
...
GANs: Generative Adversarial Networks
GANs: Generative Adversarial Networks
Make imperceptible changes to images
Can consistently defeat all NNs
Can have extremely high error rate
Some images create optical illusions
https://www.quora.com/What-are-the-pros-and-cons-
of-using-generative-adversarial-networks-a-type-of-
neural-network
ML/DL Frameworks
Caffe (templates instead of code)
Theano (influenced TensorFlow)
Tensorflow
TensorFlow Lite (release date?)
Keras (“layer” over Theano+TF)
Tefla (mini framework over TF)
Torch (Lua) + PyTorch (Facebook)
MxNET (Amazon)
CNTK (Microsoft)
Languages for ML/DL
Popular languages for ML:
R (popular among statisticians)
Python (sklearn/pandas/etc)
Popular languages for DL:
Python (Keras/Theano/TF modules)
some Java/C++/Go
“Challenges” in Deep Learning
overfitting/underfitting of a model
vanishing/exploding gradient
learning rate (too high or too low)
Debugging NNs (good luck)
Miscellaneous Topics
* Data versus algorithms:
Option A: good data + average algorithm
Option B: average data + good algorithm
=> Option A is preferred over Option B
• “Cleaning” a dataset:
De-duplicate and fix invalid/missing data (how?)
* Dimensionality reduction:
eliminate “unimportant” features (columns)
Miscellaneous Topics
* XOR requires two hidden layers to solve (why?)
• A dataset whose columns are interchangeable cannot be
solved with a CNN (why?)
• Second generation TPUs
• TensorFlow Lite (open source later in 2017)
www.tensorflow.org/tutorials
D3 Fun Samples
D3 Animation effects:
MouseMoveFadeAnim1Back1.html
SVG tiger:
svg-tiger-d3.svg
D3 and SVG tiger:
svg-tiger-d3.html
Deep Learning Playground
TF playground home page:
http://playground.tensorflow.org
Demo #1:
https://github.com/tadashi-aikawa/typescript-
playground
Converts playground to TypeScript
D3/TypeScript/Deep Learning
Download playground_master.zip
npm install
npm start
Demo converts playground to TypeScript
D3/TypeScript/Deep Learning
TypeScript files in ‘src’ directory:
state.ts
seedrandom.d.ts
playground.ts
linechart.ts
heatmap.ts
dataset.ts
nn.ts (<= activations/nodes in a neural net)
Activations in TypeScript (nn.ts)
export class Activations {
 public static TANH: ActivationFunction = {
 output: x => (Math as any).tanh(x),
 der: x => {
 let output = Activations.TANH.output(x);
 return 1 - output * output;
 }
 };
 public static RELU: ActivationFunction = {
 output: x => Math.max(0, x),
 der: x => x <= 0 ? 0 : 1
 };
Activations in TypeScript (nn.ts)
 public static SIGMOID: ActivationFunction = {
 output: x => 1 / (1 + Math.exp(-x)),
 der: x => {
 let output = Activations.SIGMOID.output(x);
 return output * (1 - output);
 }
 };
 public static LINEAR: ActivationFunction = {
 output: x => x,
 der: x => 1
 };
}
Angular/Deep Learning App (Demo #2)
Create NGDeepLearning via ‘ng’
Copy ./src/*ts files from playground_master into
NGDeepLearning/src subdirectory
Merge the two package.json files
Merge the two index.html files
install d3: npm install d3 --save
Angular/Deep Learning
Add import * as d3 from 'd3’; to the files:
dataset.ts
heatmap.ts
linechart.ts
playground.ts
Launch the app: ng serve
Deep Learning and Art/”Stuff”
“Convolutional Blending” images:
=> 19-layer Convolutional Neural Network
www.deepart.io
Bots created their own language:
https://www.recode.net/2017/3/23/14962182/ai-
learning-language-open-ai-research
https://www.fastcodesign.com/90124942/this-
google-engineer-taught-an-algorithm-to-make-
train-footage-and-its-hypnotic
About Me
I provide training for the following:
=> Deep Learning/TensorFlow/Keras
=> Android
=> Angular 4
Recent/Upcoming Books
1) HTML5 Canvas and CSS3 Graphics (2013)
2) jQuery, CSS3, and HTML5 for Mobile (2013)
3) HTML5 Pocket Primer (2013)
4) jQuery Pocket Primer (2013)
5) HTML5 Mobile Pocket Primer (2014)
6) D3 Pocket Primer (2015)
7) Python Pocket Primer (2015)
8) SVG Pocket Primer (2016)
9) CSS3 Pocket Primer (2016)
10) Android Pocket Primer (2017)
11) Angular Pocket Primer (2017)

Angular and Deep Learning

  • 1.
    Deep Learning andAngular Angular Meetup (06/14/2017) Google (Mountain View) Oswald Campesato [email protected]
  • 2.
  • 3.
    Gartner Hype Curve:Where is Deep Learning?
  • 4.
    The Impact ofAI “Robot trucks will kill far fewer people (if any). Machines don’t get distracted or look at phones instead of the road. Machines don’t drink alcohol, do drugs, or things that contribute to accidents.” Robot trucks don’t need salaries, vacations, health insurance, rest periods, or sick time. The only costs will be upkeep of the machinery.
  • 5.
    AI/ML/DL: How TheyDiffer Traditional AI (20th century): based on collections of rules Led to expert systems in the 1980s The era of LISP and Prolog
  • 6.
    AI/ML/DL: How TheyDiffer Machine Learning: Started in the 1950s (approximate) Alan Turing and “learning machines” Data-driven (not rule-based) Many types of algorithms Involves optimization
  • 7.
    AI/ML/DL: How TheyDiffer Deep Learning: Started in the 1950s (approximate) The “perceptron” (basis of NNs) Data-driven (not rule-based) large (even massive) data sets Involves neural networks (CNNs: ~1970s) Lots of heuristics Heavily based on empirical results
  • 8.
    The Rise ofDeep Learning Massive and inexpensive computing power Huge volumes of data/Powerful algorithms The “big bang” in 2009: ”deep-learning neural networks and NVidia GPUs" Google Brain used NVidia GPUs (2009)
  • 9.
    AI/ML/DL: Commonality All ofthem involve a model A model represents a system Goal: a good predictive model The model is based on: Many rules (for AI) data and algorithms (for ML) large sets of data (for DL)
  • 10.
    A Basic Modelin Machine Learning Let’s perform the following steps: 1) Start with a simple model (2 variables) 2) Generalize that model (n variables) 3) See how it might apply to a NN
  • 11.
    Linear Regression One ofthe simplest models in ML Fits a line (y = m*x + b) to data in 2D Finds best line by minimizing MSE: m = average of x values (“mean”) b also has a closed form solution
  • 12.
  • 13.
    Linear Regression: alternatives Fittinga polynomial (degree 2, 3, …) Can lead to overfitting Polynomials diverge faster than lines Can reduce predictive accuracy NB: Linear Regression != Curve Fitting
  • 14.
    Linear Regression: example#1 One feature (independent variable): X = number of square feet Predicted value (dependent variable): Y = cost of a house A very “coarse grained” model We can devise a much better model
  • 15.
    Linear Regression: example#2 Multiple features: X1 = # of square feet X2 = # of bedrooms X3 = # of bathrooms (dependency?) X4 = age of house X5 = cost of nearby houses X6 = corner lot (or not): Boolean a much better model (6 features)
  • 16.
    Linear Multivariate Analysis Generalform of multivariate equation: Y = w1*x1 + w2*x2 + . . . + wn*xn + b w1, w2, . . . , wn are numeric values x1, x2, . . . , xn are variables (features) Properties of variables: Can be independent (Naïve Bayes) weak/strong dependencies can exist
  • 17.
    Neural Network with3 Hidden Layers
  • 18.
    Neural Networks: equations Node“values” in first hidden layer: N1 = w11*x1+w21*x2+…+wn1*xn N2 = w12*x1+w22*x2+…+wn2*xn N3 = w13*x1+w23*x2+…+wn3*xn . . . Nn = w1n*x1+w2n*x2+…+wnn*xn Similar equations for other pairs of layers
  • 19.
    Neural Networks: Matrices Frominputs to first hidden layer: Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix) From first to second hidden layers: Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix) From second to third hidden layers: Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)  Apply an “activation function” to y values
  • 20.
    Neural Networks (general) Multiplehidden layers: Layer composition is your decision Activation functions: sigmoid, tanh, RELU https://en.wikipedia.org/wiki/Activation_function Back propagation (1980s) https://en.wikipedia.org/wiki/Backpropagation => Initial weights: small random numbers
  • 21.
    Activation Functions (Examples) importnumpy as np ... # Python sigmoid example: z = 1/(1 + np.exp(-np.dot(W, x))) ... # Python tanh example: z = np.tanh(np.dot(W,x)); # Python ReLU example: z = np.maximum(0, np.dot(W, x))
  • 22.
    What’s the “Best”Activation Function? Initially sigmoid was popular then tanh became popular Now RELU is preferred (better results) NB: sigmoid + tanh are used in LSTMs
  • 23.
  • 24.
  • 25.
    How to Selecta Cost Function 1) Depends on the learning type: => supervised/unsupervised/RL 2) Depends on the activation function 3) Other factors Example: cross-entropy cost function for supervised learning on multiclass classification
  • 26.
    GD versus SGD SGD(Stochastic Gradient Descent): + involves a SUBSET of the dataset + aka Minibatch Stochastic Gradient Descent GD (Gradient Descent): + involves the ENTIRE dataset More details: http://cs229.stanford.edu/notes/cs229-notes1.pdf
  • 27.
    What are HyperParameters? higher level concepts about the model such as complexity, or capacity to learn Cannot be learned directly from the data in the standard model training process must be predefined
  • 28.
    Hyper Parameters (examples) #of hidden layers in a neural network the learning rate (in many models) # of leaves or depth of a tree # of latent factors in a matrix factorization # of clusters in a k-means clustering
  • 29.
    How Many Layersin a DNN? Algorithm #1 (from Geoffrey Hinton): 1) add layers until you start overfitting your training set 2) now add dropout or some another regularization method Algorithm #2 (Yoshua Bengio): "Add layers until the test error does not improve anymore.”
  • 30.
    How Many HiddenNodes in a DNN? Based on a relationship between: # of input and # of output nodes Amount of training data available Complexity of the cost function The training algorithm
  • 31.
    Use Cases forNeural Networks CNNs (Convolutional NNs): Good for image processing 2000: CNNs processed 10-20% of all checks => Approximately 60% of all NNs RNNs (Recurrent NNs): Good for NLP and audio
  • 32.
  • 33.
  • 34.
    Types of RNNs LSTMs(Long Short Term Memory) GRUs ResNets (Residual NNs)
  • 35.
    Features of LSTMs Usedin Google speech recognition + Alpha Go input/output/forget gates they avoid the vanishing gradient problem Can track 1000s of discrete time steps Used by international competition winners Often combined with CTC
  • 36.
  • 37.
  • 38.
  • 39.
    Keras/LSTM Code Snippet importnumpy from keras.datasets import imdb from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers.embeddings import Embedding from keras.preprocessing import sequence ...
  • 40.
  • 41.
    GANs: Generative AdversarialNetworks Make imperceptible changes to images Can consistently defeat all NNs Can have extremely high error rate Some images create optical illusions https://www.quora.com/What-are-the-pros-and-cons- of-using-generative-adversarial-networks-a-type-of- neural-network
  • 42.
    ML/DL Frameworks Caffe (templatesinstead of code) Theano (influenced TensorFlow) Tensorflow TensorFlow Lite (release date?) Keras (“layer” over Theano+TF) Tefla (mini framework over TF) Torch (Lua) + PyTorch (Facebook) MxNET (Amazon) CNTK (Microsoft)
  • 43.
    Languages for ML/DL Popularlanguages for ML: R (popular among statisticians) Python (sklearn/pandas/etc) Popular languages for DL: Python (Keras/Theano/TF modules) some Java/C++/Go
  • 44.
    “Challenges” in DeepLearning overfitting/underfitting of a model vanishing/exploding gradient learning rate (too high or too low) Debugging NNs (good luck)
  • 45.
    Miscellaneous Topics * Dataversus algorithms: Option A: good data + average algorithm Option B: average data + good algorithm => Option A is preferred over Option B • “Cleaning” a dataset: De-duplicate and fix invalid/missing data (how?) * Dimensionality reduction: eliminate “unimportant” features (columns)
  • 46.
    Miscellaneous Topics * XORrequires two hidden layers to solve (why?) • A dataset whose columns are interchangeable cannot be solved with a CNN (why?) • Second generation TPUs • TensorFlow Lite (open source later in 2017) www.tensorflow.org/tutorials
  • 47.
    D3 Fun Samples D3Animation effects: MouseMoveFadeAnim1Back1.html SVG tiger: svg-tiger-d3.svg D3 and SVG tiger: svg-tiger-d3.html
  • 48.
    Deep Learning Playground TFplayground home page: http://playground.tensorflow.org Demo #1: https://github.com/tadashi-aikawa/typescript- playground Converts playground to TypeScript
  • 49.
    D3/TypeScript/Deep Learning Download playground_master.zip npminstall npm start Demo converts playground to TypeScript
  • 50.
    D3/TypeScript/Deep Learning TypeScript filesin ‘src’ directory: state.ts seedrandom.d.ts playground.ts linechart.ts heatmap.ts dataset.ts nn.ts (<= activations/nodes in a neural net)
  • 51.
    Activations in TypeScript(nn.ts) export class Activations {  public static TANH: ActivationFunction = {  output: x => (Math as any).tanh(x),  der: x => {  let output = Activations.TANH.output(x);  return 1 - output * output;  }  };  public static RELU: ActivationFunction = {  output: x => Math.max(0, x),  der: x => x <= 0 ? 0 : 1  };
  • 52.
    Activations in TypeScript(nn.ts)  public static SIGMOID: ActivationFunction = {  output: x => 1 / (1 + Math.exp(-x)),  der: x => {  let output = Activations.SIGMOID.output(x);  return output * (1 - output);  }  };  public static LINEAR: ActivationFunction = {  output: x => x,  der: x => 1  }; }
  • 53.
    Angular/Deep Learning App(Demo #2) Create NGDeepLearning via ‘ng’ Copy ./src/*ts files from playground_master into NGDeepLearning/src subdirectory Merge the two package.json files Merge the two index.html files install d3: npm install d3 --save
  • 54.
    Angular/Deep Learning Add import* as d3 from 'd3’; to the files: dataset.ts heatmap.ts linechart.ts playground.ts Launch the app: ng serve
  • 55.
    Deep Learning andArt/”Stuff” “Convolutional Blending” images: => 19-layer Convolutional Neural Network www.deepart.io Bots created their own language: https://www.recode.net/2017/3/23/14962182/ai- learning-language-open-ai-research https://www.fastcodesign.com/90124942/this- google-engineer-taught-an-algorithm-to-make- train-footage-and-its-hypnotic
  • 56.
    About Me I providetraining for the following: => Deep Learning/TensorFlow/Keras => Android => Angular 4
  • 57.
    Recent/Upcoming Books 1) HTML5Canvas and CSS3 Graphics (2013) 2) jQuery, CSS3, and HTML5 for Mobile (2013) 3) HTML5 Pocket Primer (2013) 4) jQuery Pocket Primer (2013) 5) HTML5 Mobile Pocket Primer (2014) 6) D3 Pocket Primer (2015) 7) Python Pocket Primer (2015) 8) SVG Pocket Primer (2016) 9) CSS3 Pocket Primer (2016) 10) Android Pocket Primer (2017) 11) Angular Pocket Primer (2017)