Principal Component Analysis
(PCA)
PCA - Overview
• A backbone of modern data analysis.
• A black box that is widely used but poorly understood.
• It is a mathematical tool from applied linear algebra.
• It is a simple, non-parametric method of extracting relevant
information from confusing data sets.
• It provides a roadmap for how to reduce a complex data set to a
lower dimension.
• PCA used to reduce dimensions of data without much loss of
information.
• PCA is “an orthogonal linear transformation that transfers the data to a new
coordinate system such that the greatest variance by any projection of the data
comes to lie on the first coordinate (first principal component), the second
greatest variance lies on the second coordinate (second principal component),
and so on.”
• An exploratory technique used to reduce the dimensionality of the data set to
2D or 3D
• Can be used to:
• Reduce number of dimensions in data
• Find patterns in high-dimensional data
• Visualize data of high dimensionality
• Example applications:
• Face recognition, Image compression, Gene expression analysis
PCA - Overview
Background
• Linear Algebra
• Principal Component Analysis (PCA)
• Independent Component Analysis (ICA)
• Linear Discriminant Analysis (LDA)
• Examples
Variance
• Variance – measure of the deviation from the mean for points in one
dimension, e.g., heights
• A measure of the spread of the data in a data set with mean
• Variance is claimed to be the original statistical measure of spread of data.
Covariance
• Covariance – a measure of how much each of the dimensions varies from
the mean with respect to each other.
• Covariance is measured between 2 dimensions to see if there is a
relationship between the 2 dimensions, e.g., number of hours studied
and grade obtained.
• The covariance between one dimension and itself is the variance
Covariance
•Covariance
Covariance calculations are used to find relationships between
dimensions in high dimensonal data sets (usually greater that 3) where
visualization is difficult.
• Suppose we have n attributes, A1, ..., An.
• Covariance matrix:
• Example for three attributes (x,y,z):
Covariance matrix
),cov(where),( ,, jijiji
nn
AAccC 











),cov(),cov(),cov(
),cov(),cov(),cov(
),cov(),cov(),cov(
zzyzxz
zyyyxy
zxyxxx
C




















3705.104
5.1047.47
)var(5.104
5.104)var(
),cov(),cov(
),cov(),cov(
M
H
MMHM
MHHH
Covariance matrix
Linear algebra
Matrix and vector multiplication
Transformation Matrices
Transformation Matrices
Eigenvalue Problem
Eigenvalue Problem
A v Av  v
2 3 3 12 3
x 4x
2 1 2 8 2
       
        
       
• Going back to our example: as A . v = λ . v
• Therefore, (3,2) is an eigenvector of the square matrix A and 4 is an
eigenvalue of A.
• The question is:
Given matrix A, how can we calculate the eigenvector and eigenvalues
for A?
How to calculate Eigenvalues & eigenvectors
Calculating Eigenvectors & Eigenvalues
Calculating Eigenvectors & Eigenvalues
Going through the same procedure for the second eigenvalue:
Calculating Eigenvectors & Eigenvalues
Calculating Eigenvectors & Eigenvalues
Properties of Eigenvectors and Eigenvalues

Lect4 principal component analysis-I

  • 1.
  • 2.
    PCA - Overview •A backbone of modern data analysis. • A black box that is widely used but poorly understood. • It is a mathematical tool from applied linear algebra. • It is a simple, non-parametric method of extracting relevant information from confusing data sets. • It provides a roadmap for how to reduce a complex data set to a lower dimension. • PCA used to reduce dimensions of data without much loss of information.
  • 3.
    • PCA is“an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (first principal component), the second greatest variance lies on the second coordinate (second principal component), and so on.” • An exploratory technique used to reduce the dimensionality of the data set to 2D or 3D • Can be used to: • Reduce number of dimensions in data • Find patterns in high-dimensional data • Visualize data of high dimensionality • Example applications: • Face recognition, Image compression, Gene expression analysis PCA - Overview
  • 4.
    Background • Linear Algebra •Principal Component Analysis (PCA) • Independent Component Analysis (ICA) • Linear Discriminant Analysis (LDA) • Examples
  • 5.
    Variance • Variance –measure of the deviation from the mean for points in one dimension, e.g., heights • A measure of the spread of the data in a data set with mean • Variance is claimed to be the original statistical measure of spread of data.
  • 6.
    Covariance • Covariance –a measure of how much each of the dimensions varies from the mean with respect to each other. • Covariance is measured between 2 dimensions to see if there is a relationship between the 2 dimensions, e.g., number of hours studied and grade obtained. • The covariance between one dimension and itself is the variance
  • 7.
  • 8.
    •Covariance Covariance calculations areused to find relationships between dimensions in high dimensonal data sets (usually greater that 3) where visualization is difficult.
  • 9.
    • Suppose wehave n attributes, A1, ..., An. • Covariance matrix: • Example for three attributes (x,y,z): Covariance matrix ),cov(where),( ,, jijiji nn AAccC             ),cov(),cov(),cov( ),cov(),cov(),cov( ),cov(),cov(),cov( zzyzxz zyyyxy zxyxxx C
  • 10.
  • 11.
  • 12.
    Matrix and vectormultiplication
  • 13.
  • 14.
  • 15.
  • 16.
    Eigenvalue Problem A vAv  v 2 3 3 12 3 x 4x 2 1 2 8 2                          • Going back to our example: as A . v = λ . v • Therefore, (3,2) is an eigenvector of the square matrix A and 4 is an eigenvalue of A. • The question is: Given matrix A, how can we calculate the eigenvector and eigenvalues for A?
  • 17.
    How to calculateEigenvalues & eigenvectors
  • 18.
  • 19.
  • 20.
    Going through thesame procedure for the second eigenvalue: Calculating Eigenvectors & Eigenvalues
  • 21.
  • 22.