The motivation of the project is to find a predictive model for banking system to detect if the credit card holder will make a payment or not. Initially the bank systems were mainly focused on giving more credit cards to people, however it resulted in the customers making more defaults so this model can help in predicting default customers.
The project is being prepared to understand the problems faced by the banks when a credit card is being issued to the customers to avoid the problem of default. Many customers tend to utilise their credit card beyond their repaying capabilities which eventually results in high debt accumulation.
The main aim of this project is to make banks more efficient in analysing the customer psychological behaviour and predict if the customer will make a default payment or not in the next payment.
The data has been sourced from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005. There are total of 25 variables that are following: There are 25 variables: • ID: ID of each client
• LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
• SEX: Gender (1=male, 2=female)
• EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
• MARRIAGE: Marital status (1=married, 2=single, 3=others)
• AGE: Age in years
• PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
• PAY_2: Repayment status in August, 2005 (scale same as above)
• PAY_3: Repayment status in July, 2005 (scale same as above)
• PAY_4: Repayment status in June, 2005 (scale same as above)
• PAY_5: Repayment status in May, 2005 (scale same as above)
• PAY_6: Repayment status in April, 2005 (scale same as above)
• BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
• BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
• BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
• BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
• BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
• BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
• PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
• PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
• PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
• PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
• PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
• PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
• default.payment.next.month: Default payment (1=yes, 0=no)
The given dataset contains 25 variables out of which default payment next month is the target variable i.e. predicting whether the customer is going to default (1) or not (0) in the next month. • We pre-processed the data to check if there are any missing values or outliers. • We used tableau for data visualisation for exploratory data analysis i.e. to understand relationship between the target variable and other variables. • In data modelling we used classification algorithms like CatBoost Classifier, LightGBM Model, Logistic regression, K-nearest neighbour (KNN) classifiers, and probabilistic classifiers such as Bayes classifiers. We evaluate the models with appropriate evaluation strategies and come up with the model that gives the best possible accuracy.
We have used Python for the purpose of analysis, and prediction and, Tableau for all visualizations.