- The goal of this project is to build a predictive regression model that can be useful to forecast the flight fare based on various factors
- Flight ticket prices are highly unpredictable, often changing day to day, today we might see a price, check out the price of the same flight tomorrow, it will be a different story, making it challenging for travelers and airlines to forecast costs.
-
Airline
This column will have all the types of airlines like Indigo, Jet Airways, Air India, and many more. -
Date_of_Journey
This column will let us know about the date on which the passenger's journey will start. -
Source
This column holds the name of the place form where the passenger's journey will start. -
Destination
This column holds the name of the place to where passengers wanted to travel. -
Route
Here we can know about what the route is through which passengers have opted to travel form his/her source to their destination. -
Arrival_Time
Arrival time is when the passenger will reach his/her destination. -
Duration
Duration is the whole period that a flight will take to complete its journey form source to destination. -
Total_Stops
This will let us know in how many places flights will stop there for the flight in the whole journey. -
Additional_Info
In this column, we will get information about food, kind of food, and other amenities. -
Price
Price of the flight for a complete journey including all the expenses before onboarding.
CREATE A PREDICTIVE MODEL WHICH WILL HELP THE CUSTOMERS TO PREDICT FUTURE FLIGHT PRICES AND PLAN THEIR JOURNEY ACCORDINGLY.
- It is a Regression problem, where given the above set of features, we need to estimate pric.
- Linear Regression
- Support Vector Regressor
- DecisionTreeRegressor
- RandomForestRegressor
- GradientBoostingRegressor
- XGBRegressor
| S.No | Algorithm | Train R-squared | Test R-squared | Cross Val Score |
|---|---|---|---|---|
| 1 | Linear Regression | 0.64 | 0.64 | 0.633269 |
| 2 | SVR | 0.01 | 0.01 | 0.006415 |
| 3 | SVR Tuning | 0.61 | 0.61 | 0.613411 |
| 4 | Decision Tree | 0.96 | 0.67 | 0.667105 |
| 5 | Decision Tree Tuning | 0.75 | 0.75 | 0.724305 |
| 6 | Random Forest | 0.95 | 0.79 | 0.793353 |
| 7 | Random Forest Tuning | 0.89 | 0.82 | 0.809537 |
| 8 | Gradient Boosting | 0.76 | 0.75 | 0.748531 |
| 9 | Gradient Boosting Tuning | 0.91 | 0.83 | 0.825861 |
| 10 | XGBoosting | 0.92 | 0.83 | 0.823278 |
| 11 | XGBoosting Tuning | 0.90 | 0.84 | 0.828162 |
- With the highest test R² (0.84) and cross-validation score (0.828), it shows excellent balance between performance and generalization.
- Tuned Gradient Boosting (Test R²: 0.83)
- Tuned Random Forest (Test R²: 0.82)
- Untuned SVR (Very poor fit)
- Untuned Decision Tree (Overfits badly)
- Across all models, tuning improves test and CV scores, proving the value of hyperparameter optimization.