Skip to content

Prediction of Order Returns of an Online Clothing Retailer (Real-World Data) With XGBoost and Random Forest

Notifications You must be signed in to change notification settings

AnFrBo/order_returns_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction of Order Returns of an Online Clothing Retailer

Customers send back a substantial part of the products that they purchase online. Return shipping is expensive for online platforms and return orders are said to reach 50% for certain industries and products. Nevertheless, free or inexpensive return shipping has become a customer expectation and de-facto standard in the fierce online competition on clothing, but shops have indirect ways to influence customer purchase behaviour. For purchases where return seems likely, a shop could, for example, restrict payment options or display additional marketing communication.

In order to access this information, I build a targeting model to return the likelihood of a customer returning an order and thus optimizing the shop revenue. The (real-world) data was provided by an online clothing retailer. However, the data was artificially balanced (1:1 ratio between returns and non-returns).

With my final model which was an ensemble of differently tuned xgboosts and random forests, the AUC of my prediction was 0.73725 and within the top 20% of 178 participants.

The full description of the kaggle challenge as well as the data and leaderboard can be accessed here.

All code was written in RStudio.

Organization

Author: Anna Franziska Bothe
Institute: Humboldt University Berlin, Chair of Information Systems
Course: Business Analytics and Data Science
Semester: WS 2019/20

Content

.
├── data                  # folder with original data sets and cleaned, engineered data that are outputted by the data_preparation file
├── code                  # folder with file containing data cleaning, prep and feature engineering (= data_preparation.R) plus file containing model tuning, train, selection and the code for the final prediction (= model&prediction.R)
├── final_prediction.csv  # final prediction that was handed in for the kaggle challenge
├── README.md             # this readme file
├── requirements.txt      # contains all used libraries
├── setup.txt             # describes execution of pipeline in detail

About

Prediction of Order Returns of an Online Clothing Retailer (Real-World Data) With XGBoost and Random Forest

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages