Exploratory Data Analysis (EDA) and Feature Engineering on Black Friday Dataset
Overview
This repository contains the code and analysis for performing exploratory data analysis (EDA) and feature engineering on the Black Friday dataset. The dataset consists of transactional data related to purchases made by customers on a Black Friday sale.
Dataset Data set link : https://www.kaggle.com/datasets/sdolezel/black-friday
The Black Friday dataset contains the following columns:
User_ID: Unique identifier for each user.
Product_ID: Unique identifier for each product.
Gender: Gender of the user.
Age: Age group of the user.
Occupation: Occupation of the user . City_Category: Category of the city where the user resides.
Stay_In_Current_City_Years: Number of years the user has stayed in the current city.
Marital_Status: Marital status of the user.
Product_Category_1, Product_Category_2, Product_Category_3: Product categories of the purchased items.
Purchase: Purchase amount in dollars.
Files
black_friday_data.csv: The main dataset containing transactional data.
Black_Friday_EDA_Feature_Engineering.ipynb: Jupyter Notebook containing the code for EDA and feature engineering.
README.md: This file providing an overview of the project.
Analysis
The Jupyter Notebook Black_Friday_EDA_Feature_Engineering.ipynb contains the following analysis:
Data Cleaning: Handling missing values, correcting data types, and removing duplicates if any.
Exploratory Data Analysis (EDA): Analyzing the distribution of various features, identifying patterns, and gaining insights into customer behavior during Black Friday sales.
Feature Engineering: Creating new features or transforming existing ones to improve model performance or derive meaningful insights.
Visualization: Visualizing the relationships between different features using plots and charts.
Statistical Analysis: Performing statistical tests or calculations to validate assumptions or hypotheses.
Dependencies
To run the Jupyter Notebook, you will need the following Python libraries:
Pandas NumPy Matplotlib Seaborn Sklearn
You can install these libraries using pip:
Copy code pip install pandas numpy matplotlib seaborn skleatrn
Usage Clone this repository to your local machine. Navigate to the directory containing the repository. Launch Jupyter Notebook. Open Black_Friday_EDA_Feature_Engineering.ipynb. Follow the instructions in the notebook to execute the code cells and analyze the dataset. Acknowledgments The dataset used in this analysis was obtained from Kaggle.
Author
Ghulam Mustafa