This project identifies fraudulent activity in financial transactions using a dataset of 6.3M+ records. The core challenge is the extreme class imbalance (only 0.13% are fraud). I implemented a multi-iteration approach using SMOTE and XGBoost to reach near-optimal detection levels.
Through 5 iterations of data engineering and balancing, the models achieved the following performance:
| Iteration | Strategy | XGBoost AUC | Random Forest AUC |
|---|---|---|---|
| 1 | Baseline (Default Params) | 87% | 86% |
| 2 | Hyperparameter Tuning | 99.5% | 84.3% |
| 3 | SMOTE + Tuning | 99.4% | 98.7% |
| 4 | Undersampling + Tuning | 98.8% | 99.6% |
| 5 | SMOTE + Subsampling | 99.0% | 92.1% |
The analysis revealed that the most influential features for detecting fraud are:
oldbalanceOrg(Balance before transaction)newbalanceDest(Recipient's new balance)amount(Size of the transaction)
βββ Fraud_detection.ipynb # Main analysis and model training
βββ requirements.txt # Dependencies
βββ .gitignore # Excludes large data files
βββ README.md # Project documentation
- Clone the repo:
git clone https://github.com/Akshat8510/Fraud-Detection-Project.git
- Install dependencies:
pip install -r requirements.txt
- Run the Jupyter Notebook
Fraud_detection.ipynb.