Ecommerce-Fraud-Detection

Summary:

The objective of this project is to develop the fraud detection algorithm.

The given case is divided in three parts:

Data Exploration
Data Prep
Data Modelling

Data Exploration: In this module the given dataset was analyzed to understand the data. Here are some insights: Libraries Used: Numpy : To perform operations Pandas : To create dataframes Matplotlib: To perform Visualization Sklearn: To import Random Forest Model

Load the csv files.
The Fraud_data contains rows & columns(151112, 14)
The Ip_Address_country contains rows & columns (138846, 3)
Lower Bound and Upper Bound Ip address are mapped to the Ip_Address and country in Fraud_Data.
The data is imbalanced as target variables contains 90.50% No Fraud data and 9.50% Fraud data.
The Fraud occur most on the purchase date of Jan 2nd 2015.
The purchase time is uniformly distributed for the Fraud data.
The fraud data was recorded on the Chrome browser among all.
United States has the most fraud data.
The channel was uniformly distributed when analyzing the fraud and no fraud data.
The age is uniformly distributed for both fraud and no fraud data.
Gender is uniformly distributed for both fraud and no fraud data.
Most of the purchase value between 0 and 40 has the fraud data.
For the age group of 28-35 most fraud occurred.
From the analysis we can find that most frauds occur when the purchase_date and signup_date are same while purchase_time and sign_up time are uniformly distributed.
Deleted the NaN rows from the country column.

Data Prep:

All the columns are converted into numeric form for data modelling.
Since the data is unbalanced, undersampling method (since we have enough data, used this method over oversampling method) is used to balance the data for improving the classification performance.
Signup_time and Purchase_time driver variables are excluded from the analysis as they have least significant impact on the target variable.

Data Modelling:

Random Forest model is applied on the data to classify the prediction of Fraud and No-Fraud.
It is used as it can achieve the higher accuracy for the large datasets.
Random Forest has Precision of 86% , Recall of 59% and F1 of 69%.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
EC_Fraud_Data.ipynb		EC_Fraud_Data.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ecommerce-Fraud-Detection

About

Releases

Packages

Languages

annu06/Ecommerce-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Ecommerce-Fraud-Detection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages