Regression Methods in
Machine Learning
Simple Linear Regression
Portland Data Science Group
Andrew Ferlitsch
Community Outreach Officer
July, 2017
Linear Regression
X (Independent Variable)
Y (Dependent Variable) Line
• Used to Predict a correlation between one or more
independent variables and a dependent variable.
e.g., Speeding is correlated with Traffic Deaths
• When the data is plotted on a graph, there appears to
be a straight line relationship.
(Simple) Linear Regression
• Used to Predict a correlation between a single
independent variable and a dependent variable.
• Find a linear approximate (line) relationship between
independent variable (usually referred to as x), and
the dependent variable (usually referred to as y).
• In Machine Learning, x is referred to as the feature,
and y is referred to as the label.
(Simple) Linear Regression by Many Names
• Elementary Geometry: Definition of a Line
y = mx + b
• Linear Algebra
y = a + bx
• Machine Learning
y = b0 + b1x1
y intercept or bias,
Where the line crosses
the y-axis
slope, weight
or coefficient
(Simple) Linear Regression
It’s In The Line
Age
(x)
0
Feature (data)
Spend
(y)
Label
(learn) Data Plotted (Scatter)
Best Fitted Line
y = a + bx
a
bx (slope)
Loss Function
Minimize Loss (Estimated Error) when Fitting a Line
y1
Actual Values (y)
Predicted Values (yhat)
y2
y3
y4
y5
y6
1
𝑛
𝑗=1
𝑛
(𝑦 − 𝑦ℎ𝑎𝑡)2
MSE =
(y – yhat)
Mean Square Error
Sum the Square of the Difference
Divide by the number of samples
Solving Simple Linear Equation
( 𝑦 ) ( 𝑥2 ) − ( 𝑥 ) ( 𝑥𝑦 )
n( 𝑥2 ) − ( 𝑥 )2
a =
n( 𝑥𝑦 ) −
Solution to the Equation can be Computed
( 𝑥 )( 𝑦 )
b =
n( 𝑥2 ) − ( 𝑥 )2
Solve the following summations, and then easy to compute:
( 𝑦 ) all values of y
( 𝑥 ) all values of x
( 𝑥𝑦 ) all values of x ∗ y pairs
( 𝑥2 ) all values of x2
(Simple) Linear Regression Example
Age (X) Spending (Y) X2 XY
20 10 400 200
25 30 625 750
30 50 900 1500
35 70 1225 2450
∑ 110 160 3125 4900
Spreadsheet (Excel) Process for Computing Simple Linear Regression
Raw Data Computed Values
Summations
( 𝑦 ) ( 𝑥2 ) − ( 𝑥 ) ( 𝑥𝑦 ) = 160 ∗ 3125 − 110 ∗ 4900 = −39000
n( 𝑥2 ) − ( 𝑥 )2
=
12500 − 12100 = 400
n( 𝑥𝑦 ) − ( 𝑥 )( 𝑦 ) = 19600 − 110 ∗ 160 = 2000
a = -39000 / 400 = -97.5 b = 2000 / 400 = 5

ML - Simple Linear Regression

  • 1.
    Regression Methods in MachineLearning Simple Linear Regression Portland Data Science Group Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2.
    Linear Regression X (IndependentVariable) Y (Dependent Variable) Line • Used to Predict a correlation between one or more independent variables and a dependent variable. e.g., Speeding is correlated with Traffic Deaths • When the data is plotted on a graph, there appears to be a straight line relationship.
  • 3.
    (Simple) Linear Regression •Used to Predict a correlation between a single independent variable and a dependent variable. • Find a linear approximate (line) relationship between independent variable (usually referred to as x), and the dependent variable (usually referred to as y). • In Machine Learning, x is referred to as the feature, and y is referred to as the label.
  • 4.
    (Simple) Linear Regressionby Many Names • Elementary Geometry: Definition of a Line y = mx + b • Linear Algebra y = a + bx • Machine Learning y = b0 + b1x1 y intercept or bias, Where the line crosses the y-axis slope, weight or coefficient
  • 5.
    (Simple) Linear Regression It’sIn The Line Age (x) 0 Feature (data) Spend (y) Label (learn) Data Plotted (Scatter) Best Fitted Line y = a + bx a bx (slope)
  • 6.
    Loss Function Minimize Loss(Estimated Error) when Fitting a Line y1 Actual Values (y) Predicted Values (yhat) y2 y3 y4 y5 y6 1 𝑛 𝑗=1 𝑛 (𝑦 − 𝑦ℎ𝑎𝑡)2 MSE = (y – yhat) Mean Square Error Sum the Square of the Difference Divide by the number of samples
  • 7.
    Solving Simple LinearEquation ( 𝑦 ) ( 𝑥2 ) − ( 𝑥 ) ( 𝑥𝑦 ) n( 𝑥2 ) − ( 𝑥 )2 a = n( 𝑥𝑦 ) − Solution to the Equation can be Computed ( 𝑥 )( 𝑦 ) b = n( 𝑥2 ) − ( 𝑥 )2 Solve the following summations, and then easy to compute: ( 𝑦 ) all values of y ( 𝑥 ) all values of x ( 𝑥𝑦 ) all values of x ∗ y pairs ( 𝑥2 ) all values of x2
  • 8.
    (Simple) Linear RegressionExample Age (X) Spending (Y) X2 XY 20 10 400 200 25 30 625 750 30 50 900 1500 35 70 1225 2450 ∑ 110 160 3125 4900 Spreadsheet (Excel) Process for Computing Simple Linear Regression Raw Data Computed Values Summations ( 𝑦 ) ( 𝑥2 ) − ( 𝑥 ) ( 𝑥𝑦 ) = 160 ∗ 3125 − 110 ∗ 4900 = −39000 n( 𝑥2 ) − ( 𝑥 )2 = 12500 − 12100 = 400 n( 𝑥𝑦 ) − ( 𝑥 )( 𝑦 ) = 19600 − 110 ∗ 160 = 2000 a = -39000 / 400 = -97.5 b = 2000 / 400 = 5