E-commerce Sales and Customer Behavior Analysis using the Olist Dataset

Project Description

This project uses the Olist dataset to analyze sales and customer behavior from a Brazilian e-commerce platform

Dataset Information

The Olist Dataset is an open source database from a Brazilian e-commerce platform that connects small businesses with customers from all over the country. It contains data on customer orders, products, sellers, and reviews, offering a complete view of the shopping experience.

Data Schema

The following image is showing how the different tables are related.

Dataset key tables and columns

Orders: Information about each order, including:
- order_id: Unique identifier for each order.
- order_purchase_timestamp: The date and time when the order was placed.
- order_delivered_customer_date: Delivery date of the order.
Order Items: Details on the products within each order:
- order_id: Order identifier linking to the Orders table
- product_id: Product identifier
- price: Price of each product within the order
- freight_value: Cost of freight for each product
Customers: Customer demographic information:
- customer_id: Unique customer identifier
- customer_city: City of the customer
- customer_state: State of the customer
Products: Product details:
- product_id: Unique identifier for each product
- product_category_name: Category of the product
Reviews: Customer feedback on orders:
- order_id: Order identifier linking to the Orders table
- review_score: Rating given by the customer (1-5)
Sellers: Information about the sellers on the platform:
- seller_id: Unique identifier for each seller
- seller_city, seller_state: Seller location data

Dataset Link:

The Olist dataset can be accessed on Kaggle here

Analysis of Data using SQL

Data Cleaning and Preparation

In order to facilitate the analysis, the first step is clean and prepare the data to ensure that:

The data is consistent and easy to analyze.
Only relevant columns are retained for improved performance.
Key fields for grouping and analysis are standardized and cleaned.

Cleaning process:

Orders Table:
- Changed the data type of purchase date columns from timestamp to date.
- Added separate columns for purchase month and purchase year to facilitate time-based analysis.
Order Items Table:
- Created a new column for the total value of each order (price + freight_value).
Products Table:
- Created a cleaned version of the table containing:
  - product_id and category names in English, replacing NULL values with valid translations in order to ensure consistency in category names.
- Excluded irrelevant columns such as product_name_length, product_description_length, and product dimensions, as they are not pertinent to this analysis. This optimizes the dataset for faster processing.
Customers Table:
- Added a new column for the region of the customer, based on the state. Simplified analysis by grouping states into the 5 geographical regions of Brazil.

For the full SQL queries used in this process, see Data Cleaning Queries

Exploratory Analysis

The exploratory analysis focuses on understanding the dataset through key metrics and visualizations. The following analyses were conducted:

Total Sales:
- Measured by total revenue, items sold and number of orders placed. This provides an overview of the business.
Average Order Value:
- Calculated the average value of each order to understand customer spending patterns.
Total Sales per Month and Year:
- Analyzed sales trends over time by breaking down total revenue by month and year.
Customer Distribution:
- Explored customer distribution by region and state to identify geographic trends.
Orders by Payment Type:
- Examined the distribution of orders based on the payment method used.
Orders by Status:
- Analyzed the count of orders by status, such as delivered, shipped, or canceled.

Each analysis provides insights into customer behavior, sales trends, and operational performance.

SQL queries used for this part of the analysis Here

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
power_bi_visualizations		power_bi_visualizations
sql_queries		sql_queries
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-commerce Sales and Customer Behavior Analysis using the Olist Dataset

Project Description

Dataset Information

Data Schema

Dataset key tables and columns

Dataset Link:

Analysis of Data using SQL

Data Cleaning and Preparation

Cleaning process:

Exploratory Analysis

About

Releases

Packages

pmmaria/olist-bigquery

Folders and files

Latest commit

History

Repository files navigation

E-commerce Sales and Customer Behavior Analysis using the Olist Dataset

Project Description

Dataset Information

Data Schema

Dataset key tables and columns

Dataset Link:

Analysis of Data using SQL

Data Cleaning and Preparation

Cleaning process:

Exploratory Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages