🌟 Hit star button to save this repo in your profile
The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general big data information as well as big data case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.
Welcome to the High-Performance Data Processing (HPDP) class! We are thrilled to embark on this exciting learning journey together. Before our first class, there are a few important steps you need to take:
-
Complete Information Form: Please fill in 🧑💻 your details in the provided Google Sheets document here.
-
Create a GitHub Account: Ensure you have a GitHub account by signing up at GitHub.
-
Access Teaching Materials: All teaching materials will be available on my GitHub account. Please follow this link to access the materials.
-
Fork and Star Repository: To kick off our first meeting, please fork and star the repository available here. We will be using this repository extensively.
-
GitHub Usage for Course: Throughout this course, we will utilize GitHub for sharing materials, submitting tasks, projects, and more. Make sure you have a meaningful GitHub username associated with your account.
Please be sure to complete these tasks before our first class, as they are essential for our learning and collaboration.
Looking forward to an amazing and productive class!
- Course Information
- Student information
- AWS Academy Cloud Foundations
- AWS Academy Cloud Architecting
- AWS Academy Data Engineering
- Python for beginners
- Web scraping and Python web framework
- Exploratory data analysis
- Big data processing
- Case Study
You can practice by using online Python interpreters or codepads available online. There’s not much difference between an interpreter and a codepad. An interpreter is more interactive than a codepad, but they both let you execute code and see the results.
Below, you’ll find links to some of the most popular online interpreters and codepads. Give them a go to find your favorite.
Python was released almost 30 years ago and has a rich history. You can read more about it on the History of Python Wikipedia page or in the section on the history of the software from the official Python documentation.
Python has recently been called the fastest growing programming language. If you're interested in why this is and how it’s measured, you can find out more in these articles:
Title | Link |
---|---|
Python for Data Analysis, 3E. By: Wes McKinney | |
DevFreeBooks |
Learn skills or discover useful resources with these repositories.
Unleashing the power of geospatial data: 20 python libraries transforming location-based services & beyond
As Data scientists, we all know how essential it is to have a solid understanding of pandas - Python's go-to library for data manipulation and analysis. Amazing article.
🎯 pd.read_csv()
🎯 df.describe()
🎯 df.info()
🎯 df.plot()
🎯 df.iloc()
🎯 df.loc()
🎯 df.assign()
🎯 df.query()
🎯 df.sort_values()
🎯 df.sample()
🎯 df.isnull()
🎯 df.fillna()
🎯 df.dropna()
🎯 df.drop()
🎯 pd.pivot_table()
🎯 df.groupby()
🎯 df.transpose()
🎯 df.merge()
🎯 df.rename()
🎯 df.to_csv()
- The only Performance Metrics article you will ever need!
- Analysing Data with ChatGPT (Data Analysis and ML)
- ChatGPT: Use Case 1 - Generating Datasets
- Awesome Public Datasets
- Portal Data Terbuka Malaysia
- Department of Statistics Malaysia
- data.world
- Dataportal.asia
- knoema
- The World Bank
- Dataset Search - Google
- UCI Machine Learning Repository
- Kaggle datasets
- Awesome-public-datasets
- Datahub.io
- Earthdata
- CERN Open Data Portal
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.