- Loaded data on Colab and performed analysis to obtain clean data.
- Figure 1: Two-by-two relationship analysis of all variables to identify correlations.
- Figure 2: Heat map for correlation analysis to determine relevant parameters.
- Figure 3: Bivariate distribution between
salary_in_usd
andexperience_level
.
- Identified the problem as a regression problem through exploratory data analysis (EDA).
- Chose the XGBoost model due to the medium-sized dataset, large number of features, and requirement for high precision in predictions.
- Used two classes: one for data handling and one for tools.
- Data Processing Class:
- Constants:
df
for DataFarm,model
for the model, andmodel_fit
for the trained model. - Methods:
doClean
(data cleaning),build_chart
(create data graphs),train_data
(train the model), andforecasts
(prediction).
- Constants:
- Tools Class:
- One method for converting images to base64.
- Developed a Flask web project for a user-friendly interface.
- Loaded data into the model, performed modeling, and evaluated the model's credibility.
- Combined prediction and model training as depicted in the diagram.
- Entered parameters to predict
salary
values.
- Automate address prediction based on user's IP.
- Increase sample size for improved prediction accuracy.
- Enhance input interface aesthetics.
- Add input data accuracy checks.