About Us

The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general Exploratory Data Analysis (EDA) information as well as EDA case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.

Exploratory Data Analysis

Exploratory data analysis (EDA) involves using graphics and visualizations to explore and analyze a data set. The goal is to explore, investigate and learn, as opposed to confirming statistical hypotheses.

When do I use it?: Exploratory data analysis is a powerful way to explore a data set. Even when your goal is to perform planned analyses, EDA can be used for data cleaning, for subgroup analyses or simply for understanding your data better. An important initial step in any data analysis is to plot the data.

📖 Notes

Basic Concept

Code & Practice

projectpro:Exploratory Data Analysis in Python-Stop, Drop and Explore * * * * *

Videos

Exploratory Data Analysis Tutorial | What Is EDA | How EDA Works | EDA In Python | Intellipaat * * * * *

Kaggle

* * * *

📖 Lab

No	Title	Colab	GitHub
1	Introduction to Exploratory Data Analysis
2	Exploratory data analysis in Python
3	Housing Dataset
4	Exploring data and missing values

🚀 Case Study: Intructions

Your submission will be evaluated using the following criteria:

Dataset must contain at least 5 columns and 1500 rows of data
You must ask and answer at least 5 questions about the dataset
Your submission must include at least 5 visualizations (graphs)
Your submission must include explanations using markdown cells, apart from the code.
Your work must not be plagiarized i.e. copy-pasted from somewhere else.

Follow this step-by-step guide to work on your project.

Step 1: Select a real-world dataset

The Malaysian dataset must be used for your case study.
The dataset is available at:

Step 2: Perform data preparation & cleaning

Load the dataset into a data frame using Pandas
Explore the number of rows & columns, ranges of values etc.
Handle missing, incorrect and invalid data
Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

Step 3: Perform exploratory analysis & visualization

Compute the mean, sum, range and other interesting statistics for numeric columns
Explore distributions of numeric columns using histograms etc.
Explore relationship between columns using scatter plots, bar charts etc.
Make a note of interesting insights from the exploratory analysis

Step 4: Ask & answer questions about the data

Ask at least 4 interesting questions about your dataset
Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn
Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary
Wherever you're using a library function from Pandas/Numpy/Matplotlib etc. explain briefly what it does

Step 5: Summarize your inferences & write a conclusion

Write a summary of what you've learned from the analysis
Include interesting insights and graphs from previous sections
Share ideas for future work on the same topic using other relevant datasets
Share links to resources you found useful during your analysis

Step 6: Make a submission

Upload your notebook to github.

Example Projects

Refer to these projects for inspiration:

🌟 Case Study: Exploratory Data Analysis

Team	Title	Colab	GitHub
404 Error	Property in Kuala Lumpur
Alrite	ABC
BEFE	ABC
Boboiboy	Property Listings in Kuala Lumpur
COLBY	ABC
FANTOM	ABC
HAHA	Foreign Direct Investment In Malaysia
HD	ABC
KIA	Malaysia State Election 2018
LAB	Malaysia Air Pollution Analysis
MAAM	ABC
MEOW	Capacity and utilisation of Intensive Care Unit (ICU) beds during COVID-19
MM	Malaysia's 14th State Election Result
PIXALATED	Number of deaths in Malaysia from 2001 to 2018
POTATO	ABC
QnX	ABC
SAMVERSE	ABC
SMOL	Population in Malaysia from 2010-2019
SQ	Number of Cases and Incidents Rate of Communicable Disease by State
TUK	ABC
UWU	Property Listings in Kuala Lumpur

Name		Name	Last commit message	Last commit date
Latest commit History 369 Commits
Malaysia EDA / Colby		Malaysia EDA / Colby
Malaysia EDA		Malaysia EDA
lab		lab
Assignment_EDA.ipynb		Assignment_EDA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Us

Exploratory Data Analysis

📖 Notes

Basic Concept

Code & Practice

Videos

Kaggle

📖 Lab

🚀 Case Study: Intructions

Step 1: Select a real-world dataset

Step 2: Perform data preparation & cleaning

Step 3: Perform exploratory analysis & visualization

Step 4: Ask & answer questions about the data

Step 5: Summarize your inferences & write a conclusion

Step 6: Make a submission

Example Projects

🌟 Case Study: Exploratory Data Analysis

About

Releases

Packages

Languages

Racquelmae/Python_EDA

Folders and files

Latest commit

History

Repository files navigation

About Us

Exploratory Data Analysis

📖 Notes

Basic Concept

Code & Practice

Videos

Kaggle

📖 Lab

🚀 Case Study: Intructions

Step 1: Select a real-world dataset

Step 2: Perform data preparation & cleaning

Step 3: Perform exploratory analysis & visualization

Step 4: Ask & answer questions about the data

Step 5: Summarize your inferences & write a conclusion

Step 6: Make a submission

Example Projects

🌟 Case Study: Exploratory Data Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages