CS295: Modern Systems:
Application Case Study
Neural Network Accelerator
Sang-Woo Jun
Spring 2019
Many slides adapted from
Hyoukjun Kwon‘s Gatech “Designing CNN Accelerators”
Usefulness of Deep Neural Networks
 No need to further emphasize the obvious
Convolutional Neural Network for
Image/Video Recognition
ImageNet Top-5 Classification Accuracy
Over the Years
image-net.org “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2017,” 2017
AlexNet, The Beginning
15 million images 1000 classes in the ImageNet challenge
“The first* fast**
GPU-accelerated Deep Convolutional Neural Network
to win an image recognition contest
Convolutional Neural Networks Overview
Convolution
Layer
Fully
Connected
Layer
Convolution
Layer
Convolution
Layer
Fully
Connected
Layer
Fully
Connected
Layer
goldfish: 0.002%
shark: 0.08%
magpie: 0. 02%
Palace: 89%
…
…
Paper towel: 1.4%
Spatula: 0.001%
…
… …
“Convolution” “Neural Network”
…
Training vs. Inference
 Training: Tuning parameters using training data
o Backpropagation using stochastic gradient descent is the most popular algorithm
o Training in data centers and distributing trained data is a common model*
o Because training algorithm changes rapidly, GPU cluster is the most popular
hardware (Low demand for application-specific accelerators)
 Inference: Determining class of a new input data
o Using a trained model, determine class of a new input data
o Inference usually occurs close to clients
o Low-latency and power-efficiency is required
(High demand for application specific accelerators)
Deep Neural Networks (“Fully Connected”*)
Chris Edwards, “Deep Learning Hunts for Signals Among the Noise,” Communications of the ACM, June 2018
 Each layer may have a different number of neurons
goldfish: 0.002%
Palace: 89%
Paper towel: 1.4%
Spatula: 0.001%
An Artificial Neuron
 Effectively weight vector multiplied
by input vector to obtain a scalar
 May apply activation function to
output
o Adds non-linearity
Sigmoid Rectified Linear Unit
(ReLU)
Jed Fox, “Neural Networks 101,” 2017
Convolution Layer
31 7 44
65 35 40
46 29 32
33
46
30
24 49 8 64
65 46
46 64
Convolution layer Optional pooling layer
Convolution Example
1 2 3
-2 0 -1
5 -2 4
Channel partial sum[0][0] =
1 x 0 + 2 x 1 + 3 x 0
+ (-2) x 2 + 0 x 4 + (-1) x 3
+ 5 x 5 + (-2) x 2 + 4 x 7
= 44
44
0 1 0
2 4 3
5 2 7
1 0 1
1 0 0
2 1 5
4 1 8
5 0 1
0 0 0
4 2 8
5 8 3
5 2 6 Channel partial sum[0][0] =
1 x 0 + 2 x 1 + 3 x 0
+ (-2) x 2 + 0 x 4 + (-1) x 3
+ 5 x 5 + (-2) x 2 + 4 x 7
= 44
44 -1
Typically adds zero padding to source matrix
to maintain dimensions
Convolution
Filter
Input map Output map
× =
Multidimensional Convolution
 “Feature Map” usually has multiple layers
o An image has R, G, B layers, or “channels”
 One layer has many convolution filters, which create a multichannel
output map
1 2 3
-2 0 -1
5 -2 4
Input feature map 3x3x3 filter
×
Output feature map
=
Multiple Convolutions
Filter 0
Filter 1
Input feature map
Output feature map 0
Output feature map 1
Example Learned Convolution Filters
Alex Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012
Multidimensional Convolution
Image found online. Original source unknown
Computation in the Convolution Layer
for(n=0; n<N; n++) { // Input feature maps (IFMaps)
for(m=0; m<M; m++) { // Weight Filters
for(c=0; c<C; c++) { // IFMap/Weight Channels
for(y=0; y<H; y++) { // Input feature map row
for(x=0; x<H; x++) { // Input feature map column
for(j=0; j<R; j++) { // Weight filter row
for(i=0; i<R; i++) { // Weight filter column
O[n][m][x][y] += W[m][c][i][j] * I[n][c][y+j][x+i]}}}}}}}
Pooling Layer
 Reduces size of the feature map
o Max pooling, Average pooling, …
31 7 44
65 35 40
46 29 32
33
46
30
24 49 8 64
65 46
46 64
Max pooling example
Real Convolutional Neural Network
-- AlexNet
Alex Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012
96 11x11x3 kernels 256 5x5x48 384 3x3x128 …
Simplified intuition: Higher order information at later layer
Real Convolutional Neural Network
-- VGG 16
Heuritech blog (https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/)
Contains 138 million weights and
15.5G MACs to process one 224 × 224 input image
There are Many, Many Neural Networks
 GoogLeNet, ResNet, YOLO, …
o Share common building blocks, but look drastically different
GoogLeNet (ImageNet 2014 winner)
ResNet
(ImageNet 2015 winner)

Convolution Neural Network Lecture Slides

  • 1.
    CS295: Modern Systems: ApplicationCase Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing CNN Accelerators”
  • 2.
    Usefulness of DeepNeural Networks  No need to further emphasize the obvious
  • 3.
    Convolutional Neural Networkfor Image/Video Recognition
  • 4.
    ImageNet Top-5 ClassificationAccuracy Over the Years image-net.org “ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2017,” 2017 AlexNet, The Beginning 15 million images 1000 classes in the ImageNet challenge “The first* fast** GPU-accelerated Deep Convolutional Neural Network to win an image recognition contest
  • 5.
    Convolutional Neural NetworksOverview Convolution Layer Fully Connected Layer Convolution Layer Convolution Layer Fully Connected Layer Fully Connected Layer goldfish: 0.002% shark: 0.08% magpie: 0. 02% Palace: 89% … … Paper towel: 1.4% Spatula: 0.001% … … … “Convolution” “Neural Network” …
  • 6.
    Training vs. Inference Training: Tuning parameters using training data o Backpropagation using stochastic gradient descent is the most popular algorithm o Training in data centers and distributing trained data is a common model* o Because training algorithm changes rapidly, GPU cluster is the most popular hardware (Low demand for application-specific accelerators)  Inference: Determining class of a new input data o Using a trained model, determine class of a new input data o Inference usually occurs close to clients o Low-latency and power-efficiency is required (High demand for application specific accelerators)
  • 7.
    Deep Neural Networks(“Fully Connected”*) Chris Edwards, “Deep Learning Hunts for Signals Among the Noise,” Communications of the ACM, June 2018  Each layer may have a different number of neurons goldfish: 0.002% Palace: 89% Paper towel: 1.4% Spatula: 0.001%
  • 8.
    An Artificial Neuron Effectively weight vector multiplied by input vector to obtain a scalar  May apply activation function to output o Adds non-linearity Sigmoid Rectified Linear Unit (ReLU) Jed Fox, “Neural Networks 101,” 2017
  • 9.
    Convolution Layer 31 744 65 35 40 46 29 32 33 46 30 24 49 8 64 65 46 46 64 Convolution layer Optional pooling layer
  • 10.
    Convolution Example 1 23 -2 0 -1 5 -2 4 Channel partial sum[0][0] = 1 x 0 + 2 x 1 + 3 x 0 + (-2) x 2 + 0 x 4 + (-1) x 3 + 5 x 5 + (-2) x 2 + 4 x 7 = 44 44 0 1 0 2 4 3 5 2 7 1 0 1 1 0 0 2 1 5 4 1 8 5 0 1 0 0 0 4 2 8 5 8 3 5 2 6 Channel partial sum[0][0] = 1 x 0 + 2 x 1 + 3 x 0 + (-2) x 2 + 0 x 4 + (-1) x 3 + 5 x 5 + (-2) x 2 + 4 x 7 = 44 44 -1 Typically adds zero padding to source matrix to maintain dimensions Convolution Filter Input map Output map × =
  • 11.
    Multidimensional Convolution  “FeatureMap” usually has multiple layers o An image has R, G, B layers, or “channels”  One layer has many convolution filters, which create a multichannel output map 1 2 3 -2 0 -1 5 -2 4 Input feature map 3x3x3 filter × Output feature map =
  • 12.
    Multiple Convolutions Filter 0 Filter1 Input feature map Output feature map 0 Output feature map 1
  • 13.
    Example Learned ConvolutionFilters Alex Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012
  • 14.
    Multidimensional Convolution Image foundonline. Original source unknown
  • 15.
    Computation in theConvolution Layer for(n=0; n<N; n++) { // Input feature maps (IFMaps) for(m=0; m<M; m++) { // Weight Filters for(c=0; c<C; c++) { // IFMap/Weight Channels for(y=0; y<H; y++) { // Input feature map row for(x=0; x<H; x++) { // Input feature map column for(j=0; j<R; j++) { // Weight filter row for(i=0; i<R; i++) { // Weight filter column O[n][m][x][y] += W[m][c][i][j] * I[n][c][y+j][x+i]}}}}}}}
  • 16.
    Pooling Layer  Reducessize of the feature map o Max pooling, Average pooling, … 31 7 44 65 35 40 46 29 32 33 46 30 24 49 8 64 65 46 46 64 Max pooling example
  • 17.
    Real Convolutional NeuralNetwork -- AlexNet Alex Krizhevsky et al., “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012 96 11x11x3 kernels 256 5x5x48 384 3x3x128 … Simplified intuition: Higher order information at later layer
  • 18.
    Real Convolutional NeuralNetwork -- VGG 16 Heuritech blog (https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/) Contains 138 million weights and 15.5G MACs to process one 224 × 224 input image
  • 19.
    There are Many,Many Neural Networks  GoogLeNet, ResNet, YOLO, … o Share common building blocks, but look drastically different GoogLeNet (ImageNet 2014 winner) ResNet (ImageNet 2015 winner)