This project analyzes and predicts credit defaults using machine learning models. The study examines significant explanatory variables impacting credit default probabilities and validates their importance through hypothesis testing.
- Null Hypothesis (( H_0 )): The selected explanatory variables do not have a significant impact on the probability of default.
- Alternative Hypothesis (( H_1 )): The selected explanatory variables have a significant impact on the probability of default.
The analysis involves statistical testing and machine learning models to explore relationships and predict outcomes.
To run this notebook, ensure the following libraries are installed:
pip install numpy pandas matplotlib seaborn scikit-learn ucimlrepo imblearn statsmodels
The notebook includes:
-
Data Loading and Exploration:
- Imports and prepares the dataset.
- Visualizes and summarizes data distributions.
-
Feature Engineering:
- Processes data for model compatibility.
- Handles imbalanced datasets using techniques like SMOTE.
-
Model Building and Evaluation:
- Trains machine learning models such as Logistic Regression, Decision Trees, and Random Forests.
- Evaluates models with performance metrics like accuracy, precision, recall, and AUC.
-
Statistical Analysis:
- Tests the significance of predictors.
- Validates the null hypothesis.
-
Clone this repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name
-
Open the notebook:
jupyter notebook Credit_prediction.ipynb
-
Execute cells in order.
The results include:
- Statistical evidence supporting significant predictors of credit default.
- Model comparisons for predictive accuracy and reliability.
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- Statsmodels
- Imbalanced-learn