This project involves analyzing transaction data of merchants using our company's payment processing services. The primary objectives are to categorize different types of businesses based on transaction patterns and predict merchant churn to inform retention strategies.
Project Overview Business Categorization:
Analyzed transaction data over two years to identify different types of merchants. Implemented three clustering algorithms to segment merchants based on their transaction patterns: DBSCAN Label Propagation K-Means Churn Prediction:
Developed a predictive model using Logistic Regression to forecast merchant churn within a 90-day period. Data Overview The dataset consists of the following columns:
merchant: Unique Merchant ID time: Timestamp of each transaction amount_usd_in_cents: Transaction amount in US cents Feature Engineering Derived several features to understand merchant behavior:
First and Last Payment Dates: To determine the active period of each merchant. Total Amount Paid: Total revenue generated by each merchant. Number of Payments: Transaction volume for each merchant. Lifespan: Duration between first and last payments. Acquisition and Recency Metrics: Analyze the acquisition timeline and recent activity. Average Order Value (AOV): Average transaction size. Lifetime Value (LTV): Cumulative value per merchant. Data Visualization Created a 3D scatter plot to visualize relationships between key metrics:
X-axis: Average Order Value Y-axis: Lifespan in Days Z-axis: Number of Payments Per Day Clustering Analysis
- Label Propagation Identified 45 clusters, but results showed potential instability.
- DBSCAN Formed 3 clusters; however, the distribution appeared irregular based on the 3D plot.
- K-Means Determined the optimal number of clusters using the elbow method. Chose k=3 as it provided the best representation of the data. Cluster Labels:
Cluster 0: Steady Engagers - Moderate AOV, long-term customer relationships. Cluster 1: Fast Movers - Higher frequency, lower value transactions. Cluster 2: High-Value Niche - High frequency, high LTV, possibly luxury or premium businesses. Churn Prediction Model Used Logistic Regression for churn prediction. Evaluated the model using a confusion matrix and calculated the accuracy score. Results The model's predictions achieved a high accuracy score, effectively identifying merchants at risk of churning. The clustering analysis helped in segmenting merchants into meaningful groups, aiding in tailored business strategies.