Bangladesh University of Professionals
Principal Component Analysis (PCA)
Prsented By
Wahid Ullah
Sr. Software Engineer
ITDSC, ITDTE, Bangaldesh Army
Outline
1. What is PCA?
2. Dimensionality Reduction.
3. Why PCA?
4. Important Terminologies.
5. How does PCA Work?
6. Applications of PCA
7. Advantages and Limitations
Introduction
Principal Component Analysis, commonly referred to as
PCA, is a powerful mathematical technique used in data
analysis and statistics. At its core, PCA is designed to
simplify complex datasets by transforming them into a
more manageable form while retaining the most critical
information.
- reducing the dimensionality of dataset
- Increasing interpretability without losing
information
Dimensionality Reduction
Dimensionality reduction refers to the techniques that reduce the number of input variables in a
dataset.
Why DR?
- Less dimensions for a given dataset means less computation or training time
- Redundancy is removed after removing similar entries from the dataset
- Data Compression (Reduce storage space)
- It helps to find out the most significant features and skip the rest
- Leads to better human interpretations
Why PCA?
- Dimensionality Reduction
- Noise Reduction
- Visualization
- Feature Engineering
- Overfitting Problem
- Data Compression
- Machine Learning Processing
Important Terminologies
- Variance
- Covariance
- Eigenvalues
- Eigenvectors
- Principle Component
Important Terminologies (Variance)
- Variance is the sum of squares of differences between all numbers and means.
- Variance (σ²) = (Sum of the squared differences from the mean) / (Total number of values)
- In mathematical notation: σ² = Σ(x - μ)² / (n)
Here:
- μ is the mean of independent features
-Mean (μ) = (Sum of all values) / (Total number of values)
Important Terminologies (Variance)
- The variance is a measure that indicates how much data scatter around the mean
Important Terminologies (Variance)
- In mathematical notation: σ² = Σ(x - μ)² / (n)
.
Important Terminologies (Covariance)
1. It is the relationship between a pair of random variables where change in one variable causes
change in another variable.
2. It can take any value between -infinity to +infinity, where the negative value represents the
negative relationship whereas a positive value represents the positive relationship.
3. It is used for the linear relationship between variables.
4. It gives the direction of relationship between variables.
Important Terminologies (Covariance)
The formula for the covariance (Cov) between two random variables X and Y, each with N data
points, is as follows:
Where:
- Cov(X,Y) is the covariance between X and Y.
- N is the number of data points.
- Xi and Yi represent individual data points for X and Y, respectively.
Important Terminologies (Covariance)
X Y
10 40
12 48
14 56
8 21
Covariance Matrix
Compute Eigenvalues/EigenVectors
Let A be square N*N matrix & x be non-zero vector for which :
Ax = λx
For some scalar values λ
λ = Eigenvalue of matrix A.
X = Eigenvector of matrix A.
Eigenvalues :
A-λI=0 [return n numbers of eigenvalues]
Compute Eigenvalue / Eigenvectors
How does PCA work?
Step 1: Standardize the data.
Step 2: Calculate the covariance matrix.
Step 3: Compute the eigenvectors and
eigenvalues.
Step 4: Select the principal components.
Step 5: Project data onto the new basis.
Step-By-Step Explanation of PCA (Principal Component Analysis)
Step 1: Standardization
The main aim of this step is to standardize the range of the attributes so that each one of them lie
within similar boundaries
- μ is the mean of independent features
- σ is the standard deviation of independent features
σ = √[ ∑(x - x
̄ )2 / N ]
Standardization
Dataset:
Consider a small dataset with two variables, X and Y, represented by the following data points:
X: [2, 3, 5, 7, 10]
Y: [4, 5, 7, 8, 11]
- For variable X:
- Mean (μX) = (2 + 3 + 5 + 7 + 10) / 5 = 5.4
- Standard Deviation (σX) = √[Σ(Xi - μX)² / (n - 1)] = √[(0.64 + 0.04 + 0.16 + 1.44 + 20.25) /
4] ≈ 2.40
- For variable Y:
- Mean (μY) = (4 + 5 + 7 + 8 + 11) / 5 = 7
- Standard Deviation (σY) = √[Σ(Yi - μY)² / (n - 1)] = √[(9 + 4 + 0 + 1 + 16) / 4] ≈ 2.38
Standardized X: [-1.25, -0.71, 0.36, 1.43, 0.17]
Standardized Y: [-1.34, -0.87, 0.11, 0.61, 1.50]
Covariance Matrix Computation
Covariance matrix is use to express the correlation between any two or more attributes in a
multidimensional dataset
- Variance is denoted by Var
- Covariance is denoted by Cov
Covariance Matrix Computation
Cov(X, X) Cov(X, Y)
Cov(Y, X) Cov(Y, Y)
Using the formula for covariance:
- Cov(X, X) = Σ(Standardized X * Standardized X) / (n - 1) = (1.56 + 0.50 + 0.13 + 2.05 + 0.03) / 4
≈ 1.305
- Cov(X, Y) = Σ(Standardized X * Standardized Y) / (n - 1) = (-1.67 + 0.62 + 0.04 + 0.88 + 0.26) /
4 ≈ 0.133
- Cov(Y, X) = Σ(Standardized Y * Standardized X) / (n - 1) = (-1.67 + 0.62 + 0.04 + 0.88 + 0.26) /
4 ≈ 0.133
- Cov(Y, Y) = Σ(Standardized Y * Standardized Y) / (n - 1) = (1.79 + 0.76 + 0.01 + 0.15 + 2.25) / 4
≈ 1.24
Covariance Matrix:
1.305 0.133
0.133 1.24
Compute Eigenvalues and Eigenvectors of Covariance Matrix to
Identify Principal Components
Let's assume we find two eigenvalues and corresponding eigenvectors:
Eigenvalue 1 (λ1) = 1.50
Eigenvector 1 (v1) = [0.707, 0.707]
Eigenvalue 2 (λ2) = 1.05
Eigenvector 2 (v2) = [-0.707, 0.707]
Select the Principal Components.
1. First Principle component is the direction of greatest
variability(covariance) in the data
1. Second is the next orthogonal(uncorrelated) direction
of greatest variability
Project Data onto Principal Components
To transform the data into the new principal component space, we dot-multiply the standardized
data by the eigenvectors:
- New PC1 = (Standardized X * v1, Standardized Y * v1)
- New PC2 = (Standardized X * v2, Standardized Y * v2)
Applications of PCA
- Netflix Movie Recommendations
- Grocery Shopping
- Fitness Trackers
- Car Shopping
- Real Estate
- Manufacturing and Quality Control
- Sports Analytics
- Renewable Energy
- Smart Cities
Advantages of PCA
- Prevents Overfitting
- Speeds Up Other Machine Learning Algorithms
- Improves Visualization
- Dimensionality Reduction
- Noise Reduction
Limitations of PCA
- Linearity Assumption
- Loss of Interpretability
- Loss of Information
- Sensitivity to Scaling
- Orthogonal Components
Some Mathematical Problem
Given the Following data ,Use PCA to reduce the dimension from 2 to 1
Feature Example 1 Example 2 Example 3 Example 4
X 4 8 13 7
Y 11 4 5 14
Reference
1. https://www.simplilearn.com/tutorials/machine-learning-tutorial/principal-component-analysis
2. https://www.geeksforgeeks.org/principal-component-analysis-pca/
3. https://www.cuemath.com/algebra/covariance-matrix/
4. https://www.youtube.com/watch?v=FgakZw6K1QQ&t=933s
Mathematical Problem solution Link :
1: https://www.youtube.com/watch?v=MLaJbA82nzk&t=1082s
2: https://www.youtube.com/watch?v=Kc4Fbg3zRTs
Thank You
Q&A

principle component analysis.pptx

  • 1.
    Bangladesh University ofProfessionals Principal Component Analysis (PCA) Prsented By Wahid Ullah Sr. Software Engineer ITDSC, ITDTE, Bangaldesh Army
  • 2.
    Outline 1. What isPCA? 2. Dimensionality Reduction. 3. Why PCA? 4. Important Terminologies. 5. How does PCA Work? 6. Applications of PCA 7. Advantages and Limitations
  • 3.
    Introduction Principal Component Analysis,commonly referred to as PCA, is a powerful mathematical technique used in data analysis and statistics. At its core, PCA is designed to simplify complex datasets by transforming them into a more manageable form while retaining the most critical information. - reducing the dimensionality of dataset - Increasing interpretability without losing information
  • 4.
    Dimensionality Reduction Dimensionality reductionrefers to the techniques that reduce the number of input variables in a dataset. Why DR? - Less dimensions for a given dataset means less computation or training time - Redundancy is removed after removing similar entries from the dataset - Data Compression (Reduce storage space) - It helps to find out the most significant features and skip the rest - Leads to better human interpretations
  • 5.
    Why PCA? - DimensionalityReduction - Noise Reduction - Visualization - Feature Engineering - Overfitting Problem - Data Compression - Machine Learning Processing
  • 6.
    Important Terminologies - Variance -Covariance - Eigenvalues - Eigenvectors - Principle Component
  • 7.
    Important Terminologies (Variance) -Variance is the sum of squares of differences between all numbers and means. - Variance (σ²) = (Sum of the squared differences from the mean) / (Total number of values) - In mathematical notation: σ² = Σ(x - μ)² / (n) Here: - μ is the mean of independent features -Mean (μ) = (Sum of all values) / (Total number of values)
  • 8.
    Important Terminologies (Variance) -The variance is a measure that indicates how much data scatter around the mean
  • 9.
    Important Terminologies (Variance) -In mathematical notation: σ² = Σ(x - μ)² / (n) .
  • 10.
    Important Terminologies (Covariance) 1.It is the relationship between a pair of random variables where change in one variable causes change in another variable. 2. It can take any value between -infinity to +infinity, where the negative value represents the negative relationship whereas a positive value represents the positive relationship. 3. It is used for the linear relationship between variables. 4. It gives the direction of relationship between variables.
  • 11.
    Important Terminologies (Covariance) Theformula for the covariance (Cov) between two random variables X and Y, each with N data points, is as follows: Where: - Cov(X,Y) is the covariance between X and Y. - N is the number of data points. - Xi and Yi represent individual data points for X and Y, respectively.
  • 12.
    Important Terminologies (Covariance) XY 10 40 12 48 14 56 8 21 Covariance Matrix
  • 13.
    Compute Eigenvalues/EigenVectors Let Abe square N*N matrix & x be non-zero vector for which : Ax = λx For some scalar values λ λ = Eigenvalue of matrix A. X = Eigenvector of matrix A. Eigenvalues : A-λI=0 [return n numbers of eigenvalues]
  • 14.
  • 15.
    How does PCAwork? Step 1: Standardize the data. Step 2: Calculate the covariance matrix. Step 3: Compute the eigenvectors and eigenvalues. Step 4: Select the principal components. Step 5: Project data onto the new basis.
  • 16.
    Step-By-Step Explanation ofPCA (Principal Component Analysis) Step 1: Standardization The main aim of this step is to standardize the range of the attributes so that each one of them lie within similar boundaries - μ is the mean of independent features - σ is the standard deviation of independent features σ = √[ ∑(x - x ̄ )2 / N ]
  • 17.
    Standardization Dataset: Consider a smalldataset with two variables, X and Y, represented by the following data points: X: [2, 3, 5, 7, 10] Y: [4, 5, 7, 8, 11] - For variable X: - Mean (μX) = (2 + 3 + 5 + 7 + 10) / 5 = 5.4 - Standard Deviation (σX) = √[Σ(Xi - μX)² / (n - 1)] = √[(0.64 + 0.04 + 0.16 + 1.44 + 20.25) / 4] ≈ 2.40 - For variable Y: - Mean (μY) = (4 + 5 + 7 + 8 + 11) / 5 = 7 - Standard Deviation (σY) = √[Σ(Yi - μY)² / (n - 1)] = √[(9 + 4 + 0 + 1 + 16) / 4] ≈ 2.38 Standardized X: [-1.25, -0.71, 0.36, 1.43, 0.17] Standardized Y: [-1.34, -0.87, 0.11, 0.61, 1.50]
  • 18.
    Covariance Matrix Computation Covariancematrix is use to express the correlation between any two or more attributes in a multidimensional dataset - Variance is denoted by Var - Covariance is denoted by Cov
  • 19.
    Covariance Matrix Computation Cov(X,X) Cov(X, Y) Cov(Y, X) Cov(Y, Y) Using the formula for covariance: - Cov(X, X) = Σ(Standardized X * Standardized X) / (n - 1) = (1.56 + 0.50 + 0.13 + 2.05 + 0.03) / 4 ≈ 1.305 - Cov(X, Y) = Σ(Standardized X * Standardized Y) / (n - 1) = (-1.67 + 0.62 + 0.04 + 0.88 + 0.26) / 4 ≈ 0.133 - Cov(Y, X) = Σ(Standardized Y * Standardized X) / (n - 1) = (-1.67 + 0.62 + 0.04 + 0.88 + 0.26) / 4 ≈ 0.133 - Cov(Y, Y) = Σ(Standardized Y * Standardized Y) / (n - 1) = (1.79 + 0.76 + 0.01 + 0.15 + 2.25) / 4 ≈ 1.24 Covariance Matrix: 1.305 0.133 0.133 1.24
  • 20.
    Compute Eigenvalues andEigenvectors of Covariance Matrix to Identify Principal Components Let's assume we find two eigenvalues and corresponding eigenvectors: Eigenvalue 1 (λ1) = 1.50 Eigenvector 1 (v1) = [0.707, 0.707] Eigenvalue 2 (λ2) = 1.05 Eigenvector 2 (v2) = [-0.707, 0.707]
  • 21.
    Select the PrincipalComponents. 1. First Principle component is the direction of greatest variability(covariance) in the data 1. Second is the next orthogonal(uncorrelated) direction of greatest variability
  • 22.
    Project Data ontoPrincipal Components To transform the data into the new principal component space, we dot-multiply the standardized data by the eigenvectors: - New PC1 = (Standardized X * v1, Standardized Y * v1) - New PC2 = (Standardized X * v2, Standardized Y * v2)
  • 23.
    Applications of PCA -Netflix Movie Recommendations - Grocery Shopping - Fitness Trackers - Car Shopping - Real Estate - Manufacturing and Quality Control - Sports Analytics - Renewable Energy - Smart Cities
  • 24.
    Advantages of PCA -Prevents Overfitting - Speeds Up Other Machine Learning Algorithms - Improves Visualization - Dimensionality Reduction - Noise Reduction
  • 25.
    Limitations of PCA -Linearity Assumption - Loss of Interpretability - Loss of Information - Sensitivity to Scaling - Orthogonal Components
  • 26.
    Some Mathematical Problem Giventhe Following data ,Use PCA to reduce the dimension from 2 to 1 Feature Example 1 Example 2 Example 3 Example 4 X 4 8 13 7 Y 11 4 5 14
  • 27.
    Reference 1. https://www.simplilearn.com/tutorials/machine-learning-tutorial/principal-component-analysis 2. https://www.geeksforgeeks.org/principal-component-analysis-pca/ 3.https://www.cuemath.com/algebra/covariance-matrix/ 4. https://www.youtube.com/watch?v=FgakZw6K1QQ&t=933s Mathematical Problem solution Link : 1: https://www.youtube.com/watch?v=MLaJbA82nzk&t=1082s 2: https://www.youtube.com/watch?v=Kc4Fbg3zRTs
  • 28.