Credit Scoring Probability Prediction 💳

Author: Marco Pellegrino
Year: 2024

This project aims to build a simple model to predict the probability of loan default based on loan application data. This information helps assess business risk and improve loan approval decisions.

Description

The main steps consist of processing the input data, training models, and evaluating their performance.

1. Data Processing

Raw input data is processed in the following steps:

Data Inspection
Domain-based Feature Selection
Values Formatting
Correlation Analysis
Final Distribution Inspection

Note: raw data can be read from a local CSV file or from an AWS S3 bucket.

2. Data Modeling

Develop a machine learning model based on the provided data to predict the probability of loan default. The model should achieve good accuracy and be easily interpretable by business stakeholders.

Some missing values are initially removed, others are imputed during the training phase to avoid data leakage.

Different models are implemented:

Decision Tree
Random Forest
XGBoost

3. Data Evaluation

Model performance is evaluated using the following metrics:

Log Loss
AUC Score
F1 score (for class prediction)

Data visualizations are provided to compare the models on the above metrics.

Input data

The dataset contains loan application data. Data cannot be attached to this repository. Each data point includes information relevant to assessing loan risk, such as financial ratios, company demographics, and loan terms. The following features are provided:

Feature	Description	Type	Values
`r_application_id`	Application ID	Integer
`applic_date`	Application Date	Date
`company_type`	Company Type	String	Fixed "AB"
`company_rating`	Company Rating	Float	Scale 0-100: 0=worst, 100=best
`incorporation_date`	Incorporation Date	Date
`net_turnover`	Net Turnover	Float
`person_scoring`	Person's Scoring	Float	Scale 0-100: 0=worst, 100=best
`prev_contr_count`	Number of Previous Loan Contracts	Integer
`max_late_1yr`	Longest Payment Delay in Previous 12 Months	Float
`uc_risk_class`	UC Risk Class	Integer	Scale 1-5: 1=worst, 5=best
`default`	Loan Default	Integer	Binary: 1 if loan defaulted (was sent to collection), 0 if no default

Install Required Libraries

To install the required Python libraries:

pip install -r requirements.txt

Note: In some environments, use pip3 instead of pip.

The code has been tested with Python 3.11.

Project Structure

Note: paths of resources (data frames, plots, ...) are defined in config.py.

.
├── data/
    ├── raw/
        └── loan_application_data-raw.csv   # raw input dataset
    └── preprocessed/
        └── loan_application_data-preprocessed.csv # preprocessed data for training
    ├── evaluation
        ├── all/
            └── evaluation-*.csv   # AUC, Log Loss, F1 Scores
        ├── tpf/
            └── evaluation_tpr-*.csv   # True Positive Rates or ROC-AUC per model
        └── fpr/
            └── evaluation_fpr-*.csv # # False Positive Rates or ROC-AUC per model
├── plots/
    ├── raw_data/ # Plot of raw data
    ├── models/ # Plot of raw data
        ├── models_feature_importance/    # Plots of model feature importance
        ├── model_comparison/    # Plots to compare models
        ├── models_auc_roc_curve/    # Plots of ROC Curve during CV Validation
        └── rules_decision_tree.png  # Decision Rules of Decision Tree Model
├── 1_EDA-preprocessing.ipynb    # Process raw data for modeling
├── 2-training-*.ipynb    # Training and evaluation for different models
├── 3_compare_models.ipynb  # Compare model performance with visualizations
├── requirements.txt    # Required Python libraries
├── config.py    # Paths definition
└── README.md    # This description

License

This repository is licensed under the GNU General Public License v3.0 (GPL-3.0). For more details, see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
1_EDA-preprocessing.ipynb		1_EDA-preprocessing.ipynb
2_training_DecisionTree.ipynb		2_training_DecisionTree.ipynb
2_training_RandomForest.ipynb		2_training_RandomForest.ipynb
2_training_XGBoost.ipynb		2_training_XGBoost.ipynb
3_compare_models.ipynb		3_compare_models.ipynb
LICENSE		LICENSE
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Scoring Probability Prediction 💳

Table of Contents

Description

Input data

Install Required Libraries

Project Structure

License

About

Releases

Packages

Languages

License

marcopellegrinoit/credit-scoring-probability-prediction

Folders and files

Latest commit

History

Repository files navigation

Credit Scoring Probability Prediction 💳

Table of Contents

Description

Input data

Install Required Libraries

Project Structure

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages