This document provides an introduction to speech recognition with deep learning. It discusses how speech recognition works, the development of the field from early methods like HMMs to modern deep learning approaches using neural networks. It defines deep learning and explains why it is called "deep" learning. It also outlines common deep learning architectures for speech recognition, including CNN-RNN models and sequence-to-sequence models. Finally, it describes the layers of a CNN like convolutional, pooling, ReLU and fully-connected layers.