Salary Classification Web Application (Flask + Machine Learning)

A machine learning web application that predicts whether a person's salary is greater than $50K or less/equal to $50K based on demographic, education, and employment attributes.

The model is trained using Scikit-learn pipelines and deployed locally using Flask, allowing users to input their information through a web interface and receive real-time predictions.

Project Overview

This project demonstrates an end-to-end machine learning workflow, including:

Data exploration and visualization
Feature preprocessing
Machine learning model training
Hyperparameter tuning
Model comparison
Pipeline serialization
Web deployment using Flask
Interactive user interface for predictions

The final deployed model is a tuned Random Forest Classifier integrated within a preprocessing pipeline.

Problem Statement

Predict whether an individual's annual salary exceeds $50K based on demographic and employment attributes.

This is a binary classification problem where the target variable is:

salary ∈ {<=50K, >50K}

Machine Learning Pipeline

The project implements a Scikit-learn pipeline to ensure consistent preprocessing during both training and prediction.

Pipeline Components

Feature separation
- Numerical features
- Categorical features
Numerical preprocessing
- StandardScaler
Categorical preprocessing
- OneHotEncoder
Feature transformation
- ColumnTransformer
Model training
- RandomForestClassifier
Pipeline serialization
- Saved using joblib

Dataset Features

Feature	Description
age	Age of the individual
workclass	Type of employment
fnlwgt	Final sampling weight
education	Highest education level
education_num	Total years of education
marital_status	Marital status
occupation	Job occupation
relationship	Family relationship
race	Race category
sex	Gender
capital_gain	Income from investments
capital_loss	Loss from investments
hours_per_week	Working hours per week
native_country	Country of origin

Target Variable

salary: <=50K or >50K

Model Comparison

Two machine learning models were evaluated:

Model	Description
Logistic Regression	Linear baseline classifier
Random Forest	Ensemble tree-based classifier

Evaluation Metrics

Accuracy
Precision
Recall
F1 Score
ROC-AUC

Final Model

The tuned Random Forest model achieved the best performance and was selected for deployment.

Project Structure

Salary-Classification-Flask-App/ │ ├── training.py ├── salary_classification_app.py │ Flask application that loads the trained ML pipeline │ and serves predictions through a web interface. │ ├── model_artifacts/ │ Saved machine learning artifacts. │ │ │ ├── random_forest_tuned.pkl │ │ Final trained pipeline containing preprocessing + model. │ │ │ └── random_forest_tuned_pickle.pkl │ Alternate serialized model object. │ ├── templates/ │ HTML templates rendered by Flask using Jinja2. │ │ │ ├── index.html │ │ User interface form for entering input features. │ │ │ └── model_results.html │ Displays salary prediction results. │ ├── static/ │ Static assets used by the web interface. │ │ │ └── style.css │ CSS styling for the application layout. │ ├── notebooks/ │ Jupyter notebooks used during experimentation. │ │ │ └── salary_classification_pipeline.ipynb │ Data exploration, visualization, model training, │ and pipeline serialization. │ ├── requirements.txt │ Python dependencies required to run the project. │ └── README.md

Application Workflow

User Input (Web Form) │ ▼ Flask Server │ ▼ Input Converted to Pandas DataFrame │ ▼ Saved ML Pipeline (Preprocessing + Random Forest) │ ▼ Prediction │ ▼ Render Result Page

Installation

Clone the Repository

git clone https://github.com/chonzadaniel/salary-classification-flask-app.git

Install Dependencies

pip install -r requirements.txt

Running the Application

Start the Flask server:

python app.py

Open your browser and navigate to:

http://127.0.0.1:5000/

Enter the required information and click Predict Salary.

Example Prediction Output

Predicted Salary: >50K Probability: 82.47%

Technologies Used

Backend

Python
Flask

Machine Learning

Scikit-learn
RandomForestClassifier
LogisticRegression
Pipeline
ColumnTransformer

Data Processing

Pandas
NumPy

Visualization

Matplotlib
Seaborn

Frontend

HTML5
CSS3
Jinja2 Templates

Future Improvements

Possible enhancements include:

Deploying the application on AWS / Render / Heroku
Containerizing the application using Docker
Adding input validation
Implementing feature importance visualization
Integrating SHAP explainability
Creating a REST API endpoint

Author

Emmanuel Daniel Chonza

Data Scientist | Monitoring & Evaluation Expert | Generative AI Enthusiast

GitHub:
https://github.com/chonzadaniel

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Salary Classification Web Application (Flask + Machine Learning)

Project Overview

Problem Statement

Machine Learning Pipeline

Pipeline Components

Dataset Features

Target Variable

Model Comparison

Evaluation Metrics

Final Model

Project Structure

Application Workflow

Installation

Clone the Repository

Install Dependencies

Running the Application

Example Prediction Output

Technologies Used

Backend

Machine Learning

Data Processing

Visualization

Frontend

Future Improvements

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
salary_classification_app.py		salary_classification_app.py
salaryprediction_data.csv		salaryprediction_data.csv
training.py		training.py

Folders and files

Latest commit

History

Repository files navigation

Salary Classification Web Application (Flask + Machine Learning)

Project Overview

Problem Statement

Machine Learning Pipeline

Pipeline Components

Dataset Features

Target Variable

Model Comparison

Evaluation Metrics

Final Model

Project Structure

Application Workflow

Installation

Clone the Repository

Install Dependencies

Running the Application

Example Prediction Output

Technologies Used

Backend

Machine Learning

Data Processing

Visualization

Frontend

Future Improvements

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages