Skip to content

ElvisKoech/Deloitte_Hackathon_predict_Loan_Defaulter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deloitte_Hackathon_predict_Loan_Defaulter

PROBLEM STATEMENT

  • Aim of the problem is to predict loan status based on certain features.

  • Dataset Description Train.csv - 67463 rows x 35 columns (Includes target column as Loan Status)

Attributes:

  • ID: unique ID of representative
  • Loan Amount: loan amount applied
  • Funded Amount:loan amount funded
  • Funded Amount Investor: loan amount approved by the investors
  • Term: term of loan (in months)
  • Batch Enrolled: batch numbers to representatives
  • Interest Rate: interest rate (%) on loan
  • Grade: grade by the bank
  • Sub Grade: sub-grade by the bank
  • Employment Duration: duration
  • Home Ownership: Owner ship of home
  • Verification Status: Income verification by the bank
  • Payment Plan: if any payment plan has started against loan
  • Loan Title: loan title provided
  • Debit to Income: ratio of representative's total monthly debt repayment divided by self reported monthly income excluding mortgage
  • Delinquency - two years: number of 30+ days delinquency in past 2 years
  • Inquires - six months: total number of inquiries in last 6 months
  • Open Account: number of open credit line in representative's credit line 19. Public Record: number of derogatory public records
  • Revolving Balance: total credit revolving balance
  • Revolving Utilities: amount of credit a representative is using relative to revolving_balance
  • Total Accounts: total number of credit lines available in representatives credit line
  • Initial List Status: unique listing status of the loan - W(Waiting), F(Forwarded)
  • Total Received Interest: total interest received till date
  • Total Received Late Fee: total late fee received till date
  • Recoveries: post charge off gross recovery
  • Collection Recovery Fee: post charge off collection fee
  • Collection 12 months Medical: total collections in last 12 months excluding medical collections
  • Application Type: indicates when the representative is an individual or joint
  • Last week Pay: indicates how long (in weeks) a representative has paid EMI after batch enrolled
  • Accounts Delinquent: number of accounts on which the representative is delinquent
  • Total Collection Amount: total collection amount ever owed
  • Total Current Balance: total current balance from all accounts
  • Total Revolving Credit Limit: total revolving credit limit
  • Loan Status: 1 = Defaulter, 0 = Non Defaulters

Test.csv - 28913 rows x 34 columns(Includes target column as Loan Status) Sample Submission.csv - Please check the Evaluation section for more details on how to generate a valid submission.

The challenge is to predict the Loan Status

Knowledge and Skills Big dataset, underfitting vs overfitting Optimising log_loss to generalise well on unseen data

Data Preprocessing

  • As the values of columns Employment Duration and Home Ownership are interchanged, these columns are renamed to their correct names.
  • Categorical attributes are encoded using LabelEncoder as we will using Random Forest for building the model.

Features Selection

  • used ExtraTreesClassifier to select the best features

valuation Metric

The competition evaluation metric used is Log-loss.

Approach

As this is a classification problem that involves prediction of whether a loan applicant will default or not, built Logistic Regression ,Random Forest and Xgboost models. performed log loss each model and Xgboost performed better with a log loss of 0.32. later performed hyperparameter tunning and train the xgboost model using Aws sagemaker instances And the f1 scores of the model improve from 0.91 to 0.94

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published