credible-clients

This repository contains a small pre-task for potential ML team members for UBC Launch Pad.

Overview

The dataset bundled in this repository contains information about credit card bill payments, courtesy of the UCI Machine Learning Repository. Your task is to train a model on this data to predict whether or not a customer will default on their next bill payment.

Most of the work should be done in model.py. It contains a barebones model class; your job is to implement the fit and predict methods, in whatever way you want (feel free to import any libraries you wish). You can look at main.py to see how these methods will be called. Don't worry about getting "good" results (this dataset is very tough to predict on) — treat this as an exploratory task!

To run this code, you'll need Python and three libraries: NumPy, SciPy, and scikit-learn. After invoking python main.py from your shell of choice, you should see the model accuracy printed: approximately 50% if you haven't changed anything, since the provided model predicts completely randomly.

Summary

According to the paper ("The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients", I-Cheng Yeh, Che-hui Lien) which analyzed the same data, the research shows that the forecasting model produced by artificial neural network has the highest coefficient of determination. Thus, I used neural network model using Keras to predict the given data. In addition, I used Gradient Boosting framework using XGBoost which is a scalable tree boosting system and it is faster than Scikit-learn GradienBoostingClassifier. The result shows that XGBoost method estimates the real probability of default sliglty better than Keras.

Result

1.Neural Network Model using Keras

Accuracy: 81.947%, Precision: 67.127%, Recall: 32.883%

2.Gradient Boosting framework using XGBoost

Accuracy: 82.013%, Precision: 66.088%, Recall: 35.095%

Sources

1.The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, https://bradzzz.gitbooks.io/ga-dsi-seattle/content/dsi/dsi_05_classification_databases/2.1-lesson/assets/datasets/DefaultCreditCardClients_yeh_2009.pdf

2.Build your First Deep Learning Neural Network Model using Keras in Python, https://medium.com/@pushkarmandot/build-your-first-deep-learning-neural-network-model-using-keras-in-python-a90b5864116d

Data Format

X_train and X_test contain data of the following form:

Column(s)	Data
0	Amount of credit given, in dollars
1	Gender (1 = male, 2 = female)
2	Education (1 = graduate school; 2 = university; 3 = high school; 4 = others)
3	Marital status (1 = married; 2 = single; 3 = others)
4	Age, in years
5–10	History of past payments over 6 months (-1 = on-time; 1 = one month late; …)
11–16	Amount of previous bill over 6 months, in dollars
17–22	Amount of previous payment over 6 months, in dollars

y_train and y_test contain a 1 if the customer defaulted on their next payment, and a 0 otherwise.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
data		data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

credible-clients

Overview

Summary

Result

Sources

Data Format

About

Releases

Packages

Languages

jessHuh/credible-clients

Folders and files

Latest commit

History

Repository files navigation

credible-clients

Overview

Summary

Result

Sources

Data Format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages