Skip to content

This project focuses on analyzing employee-related data to gain insights into key factors that influence employee performance and attrition within an organization.

Notifications You must be signed in to change notification settings

siddheshwarkoli/Employee-Performance-Analysis-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Employee-Performance-Analysis-Classification

PROJECT SUMMARY:

BUSINESS CASE & GOAL OF PROJECT: BASED ON GIVEN FEATURE OF DATASET WE NEED TO PREDICT THE PERFOMANCE RATING OF EMPLOYEE

The Data science project which is given here is an analysis of employee performance

THE GOAL AND INSIGHTS OF THE PROJECT ARE AS FOLLOWS:

1. Analysis

Data were analyzed by describing the features present in the data. the features play the bigger part in the analysis. The features tell the relation between the dependent and independent variables. Pandas also help to describe the datasets answering following questions early in our project. The data present in the dataset are divided into numerical and categorical data.

2.Univariate, Bivariate & Multivariate Analysis:

Library Used: Matplotlib & Seaborn Plots Used: Histplot, Lineplot, CountPlot, Barplot

3.Explotary Data Analysis

Basic Check & Statistical Measures* Their is no constant column is present in Numerical as well as categoriacl data.

4.Data Pre-Processing

  1. Check Missing Value: Their is no missing value in data

  2. Categorical Data Conversion: Handel categorical data with the help of frequency and mannual encoding, because feature is contain lot's of labels

  3. Outlier Handling Some features are contain outliers so we are impute this outlier with the help of IQR because in all features data is not normally distributed

  4. Feature Transformation: In YearsSinceLastPromotion some skewed & kurtosis is present, so we are use Square Root Transformation techinque

  5. Scaling The Data: scaling the data with the help of Standard scalar

5.Feature Selection

  1. Drop unique and constant feature: Dropping employee number because this is a constant column as well as drop Years Since Last Promotion because we create a new feaure using square root transformation

  2. Checking Correlation: Checking correlation with the help of heat map, and get the their is no highly correlated feature is present.

  3. Check Duplicates: In this data Their is no dupicates is present.

  4. PCA: Use pca to reduce the dimension of data, Data is contain total 27 feature after dropping unique and constant column,from PCA it shows the 25 feature has less varaince loss, so we are going to select 25 feature.

  5. Saving Pre-Process Data: save the all preprocess data in new file and add target feature to it.

6.Machine learning Model Creation & Evaluation

  1. Define Dependant and Independant Features:

  2. Balancing the data: The data is imbalance, so we need to balance the data with the help of SMOTE

3.Splitting Training And Testing Data: 80% data use for training & 20% data used for testing

Algorithms:

  • AIM: Create a sweet spot model (Low bias, Low variance)

HERE WE WILL BE EXPERIMENTING WITH THREE ALGORITHM

  1. Support Vector Machine
  2. Random Forest
  3. Artificial Neural Network [MLP Classifier]

8.Saving Model

Save model with the helpof pickle file

Tools and Library Used:

Tools:

Jupyter

Library Used:

  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • pylab
  • Scipy
  • Sklearn
  • Pickle

About

This project focuses on analyzing employee-related data to gain insights into key factors that influence employee performance and attrition within an organization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published