Don't forget to hit the โญ if you like this repo.
The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general big data information as well as big data case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering) , Universiti Teknologi Malaysia student.
๐Exploratory Data Analysis
Team
Title
Colab
GitHub
404 Error
Property in Kuala Lumpur
Alrite
The Exportation of Plantation in Sarawak
BEFE
Covid-19 Clusters in Malaysia
Boboiboy
Property Listings in Kuala Lumpur
COLBY
Malaysia GE-14 Result
FANTOM
Daily recorded COVID-19 cases at state level In Malaysia
HAHA
Foreign Direct Investment In Malaysia
HD
Guna Tanah Tampin 2021
KIA
Malaysia State Election 2018
LAB
Malaysia Air Pollution Analysis
MAAM
Malaysia Hospital Patient Movement Analysis
MEOW
Capacity and utilisation of Intensive Care Unit (ICU) beds during COVID-19
MM
Malaysia's 14th State Election Result
PIXALATED
Number of deaths in Malaysia from 2001 to 2018
POTATO
Death by state, sex and age group Malaysia 2001-2018
QnX
Real Estate Kuala Lumpur Malaysia
SAMVERSE
Restaurant Rating in Malaysia
SMOL
Population in Malaysia from 2010-2019
SQ
Number of Cases and Incidents Rate of Communicable Disease by State
TUK
Number of Government School Pupils by District Education Office and State 2017-2018
UWU
Property Listings in Kuala Lumpur
๐Pandas - Data Processing
Team
Title
GitHub
404 Error
Sales Analysis
Alrite
EDA on The Nasa Jpl Aesteroid
BEFE
Summary of Google Play Store Application
Boboiboy
Car Sales Data
COLBY
Banking Loan Credit
FANTOM
Google Playstore App
HAHA
Car sales in Russia by region
HD
New York Yellow Taxi Trip Data 2016-03
KIA
US Road Construction and Closures 2016-2021 Analysis
LAB
Fraudulent Transaction Analysis and Prediction
MAAM
US Accidents (2016 - 2021) Analysis
MEOW
Apple AppStore App Data
MM
2015 Flight Delays and Cancellations
PIXALATED
Google Playstore Application Summary
POTATO
Flight Delays and Cancellations at 2015
QnX
Trump vs Biden on Twitter
SAMVERSE
Google Playstore Management
SMOL
USA House Listing
SQ
Online Payment Fraud Detection
TUK
Fraud Detection in Online Payment
UWU
Airline Delay 2017
๐ Alternatives to Pandas for Processing Large Datasets
Team
Library
Title
GitHub
AdMiPeQa
DataTable
Health Insurance Marketplace
QwQ
Polars
Health Insurance Marketplace
BigMac
Vaex
Health Insurance Marketplace
Sepuluh
Pyspark
Health Insurance Marketplace
High Five
Koalas
Health Insurance Marketplace
SIX
cuDF
Health Insurance Marketplace
No name
DataTable
Health Insurance Marketplace
QUAD
Polars
NYC yellow taxi trip data
Rojak
Vaex
Health Insurance Marketplace
SamVerse
Pyspark
1000000 Sales Records
SDS
Koalas
NYC yellow taxi trip data
๐ Processing Large Datasets: Library Comparison
Team
Library
Title
GitHub
AdMiPeQa
Pandas vs DataTable
Health Insurance Marketplace
QwQ
Pandas vs Polars
Health Insurance Marketplace
BigMac
Pandas vs Vaex
Health Insurance Marketplace
SamVerse
Pandas vs Pyspark
1000000 Sales Records
High Five
Pandas vs Koalas
Health Insurance Marketplace
SIX
Pandas vs cuDF
Health Insurance Marketplace
No name
Pandas vs DataTable
Health Insurance Marketplace
QUAD
Pandas vs Polars
NYC Yellow Taxi Trip
Rojak
Pandas vs Vaex
Health Insurance Marketplace
Sepuluh
Pandas vs Pyspark
Health Insurance Marketplace
SDS
Pandas vs Koalas
NYC Yellow Taxi Trip
Team
Library 1
Library 2
Library 3
Dataset
Open in GitHub
AdMiPeQa
Pandas
Dask
Koalas
Air Flight Analysis
BigMac
Vaex
Koalas
PySpark
Airline Delay and Cancellation Data 2016 - 2018
No Name
Pandas
PySpark
Koalas
Amazon Book Review
QUAD
Polars
Koalas
Datatable
NYC yellow taxi trip data
QwQ
Koalas
Pyspark
Dask
NYC Automated Traffic Volume Counts
Rojak
Pandas
Vaex
Koalas
15 Million Chess Games from Lichess (2013-2014)
SDS
Pandas
Polars
Koalas
Analysis of Amazon Books Review
SIX
Dask
Pyspark
Koalas
NYC Parking Tickets
SamVerse
Pandas
PySpark
Koalas
Spotify Charts
Sepuluh
Pyspark
Polars
Pandas
Airline Delay and Cancellation Data 2017 - 2018
High Five
Pandas
Koalas
Modin
Airline Delay and Cancellation Data 2015 - 2016
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.