{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lecture 6: Accelerating gradient descent with Chebyshev polynomials\n",
"\n",
"[EE227C course page](https://ee227c.github.io/) \n",
"[Download ipynb file](https://ee227c.github.io/code/lecture6.ipynb)\n",
"\n",
"In this lecture we will derive an intriguing way to speed up gradient descent using Chebyshev polynomials. In doing so, we'll also fill in the details for a blog post I wrote four and a half years ago called [zen of gradient descent](http://blog.mrtz.org/2013/09/07/the-zen-of-gradient-descent.html), just in case you were still waiting on that.\n",
"\n",
"Overview:\n",
"\n",
"* [A closer look at quadratics](#quadratics)\n",
"* [Connection to polynomial approximation](#polynomials)\n",
"* [Chebyshev polynomials](#chebyshev)\n",
"* [Accelerated gradient descent for quadratics](#acceleration)\n",
"\n",
"Bibliographic note: *Linear Algebra and its Applications* by Peter D. Lax has a fantastic exposition of this material in Chapter 17. It's generally a fabulous linear algebra text that I highly recommend."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
" \n",
"$$f(x) = \\frac12 x^\\top A x - b^\\top x$$\n",
" \n",
"$$\\nabla f(x) = Ax-b\\qquad\\text{and}\\qquad \\nabla^2 f(x) = A.$$\n",
" \n",
"$$\n",
"e_t = \\|x_t-x^*\\|\n",
"$$\n",
" \n",
"$$\n",
"x^* = (I-\\eta_t A)x^* +\\eta b.\n",
"$$\n",
" \n",
"$$\n",
"\\begin{align}\n",
"e_{t+1} & = x_{t+1}-x^* \\\\\n",
" & = (I-\\eta_t A)x_t +\\eta_t b - ( (I-\\eta_t A)x^* +\\eta_t b ) \\\\\n",
" & = (I-\\eta_t A)e_t.\n",
"\\end{align}\n",
"$$\n",
" \n",
"$$\n",
"p_t(a) = \\prod_{i=1}^t (1-\\eta_ta)\\,.\\qquad(*)\n",
"$$\n",
" \n",
"$$\n",
"\\|e_t\\|\\le \\|p_t(A)\\|\\cdot\\|e_0\\|\\,.\n",
"$$\n",
" \n",
"$$\n",
"\\|p_t(A)\\|\\le \\max_{\\alpha\\le a\\le\\beta} \\left|p_t(a)\\right|\\,.\n",
"$$\n",
" \n",
"$$\n",
"\\|e_t\\|\\le \\left(1-\\frac1{\\kappa}\\right)^t\\|e_1\\|\n",
"\\le \\exp\\left(-\\alpha t/\\beta\\right)\\|x_0-x^*\\|\\,.\n",
"$$\n",
" \n",
"$$\n",
"\\begin{align}\n",
"T_0(a) &= 1\\\\\n",
"T_1(a) &= x\\\\\n",
"T_k(a) &=2aT_{k-1}(a)-T_{k-2}(a),\\qquad k\\ge 2.\\\\\n",
"\\end{align}\n",
"$$\n",
" \n",
"$$\n",
"P_k(a) = T_k\\left(\\frac{\\beta+\\alpha-2a}{\\beta-\\alpha}\\right)\\cdot T\\left(\\frac{\\beta+\\alpha}{\\beta-\\alpha}\\right)^{-1}\\,.\n",
"$$\n",
" \n",
"$$\n",
"|P_k(a)| \\le P_k(\\alpha)= T\\left(\\frac{\\beta+\\alpha}{\\beta-\\alpha}\\right)^{-1}.\n",
"$$\n",
" \n",
"$$\n",
"\\frac{\\beta+\\alpha}{\\beta-\\alpha}\n",
"=\\frac{\\kappa+1}{\\kappa-1}=1+\\epsilon\n",
"$$\n",
"for some $\\epsilon\\approx 2/\\kappa.$\n",
" \n",
"$$\n",
"T_k(1+\\epsilon)\\ge \\frac{\\big(1+\\sqrt{\\epsilon}\\big)^k}2\n",
"$$\n",
" \n",
"$$\n",
"e^{\\phi} = 1+\\epsilon+\\sqrt{2\\epsilon+\\epsilon^2}\n",
"\\ge 1+\\sqrt{\\epsilon}\\,.\n",
"$$\n",
" \n",
"$$\n",
"T_k(1+\\epsilon)=\\frac{(e^{\\phi})^k+(e^{\\phi})^{-k}}{2}\n",
"\\ge \\frac{\\big(1+\\sqrt{\\epsilon}\\big)^k}2\\,.\n",
"\\qquad\\qquad\\square\n",
"$$\n",
" \n",
"$$\n",
"T_k\\left(\\frac{\\beta+\\alpha}{\\beta-\\alpha}\\right)^{-1}\n",
"\\le 2\\big(1+\\sqrt{\\epsilon}\\big)^{-k}\\,.\n",
"$$\n",
" \n",
"$$\n",
"\\|e_k\\|\\le 2\\big(1+\\sqrt{\\epsilon}\\big)^{-k}\\|e_0\\|\n",
"$$\n",
" \n",
"$$\n",
"P_{k+1}(a) = (\\eta_k a + \\gamma_k)P_k(a) + \\mu_k P_{k-1}(a),\n",
"$$\n",
" \n",
"$$\n",
"\\begin{align}\n",
"x_{k+1} &= (\\eta_k A + (1-\\mu_k))x_k + \\mu_k x_{k-1}-\\eta_k b\\\\\n",
"&= x_k - \\eta_k\\nabla f(x_k) + \\mu_k (x_k - x_{k-1})\n",
"\\end{align}\n",
"$$\n",
"