Skip to content

nadiamel/HPDP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

🌟 Hit star button to save this repo in your profile

About Us

The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general big data information as well as big data case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.

Essential Preparations for a Successful Start in High-Performance Data Processing Class 🚀

Welcome to the High-Performance Data Processing (HPDP) class! We are thrilled to embark on this exciting learning journey together. Before our first class, there are a few important steps you need to take:

  1. Complete Information Form: Please fill in 🧑‍💻 your details in the provided Google Sheets document here.

  2. Create a GitHub Account: Ensure you have a GitHub :octocat: account by signing up at GitHub.

  3. Access Teaching Materials: All teaching materials will be available on my GitHub account. Please follow this link to access the materials.

  4. Fork and Star Repository: To kick off our first meeting, please fork and star the repository available here. We will be using this repository extensively.

  5. GitHub Usage for Course: Throughout this course, we will utilize GitHub for sharing materials, submitting tasks, projects, and more. Make sure you have a meaningful GitHub username associated with your account.

Please be sure to complete these tasks before our first class, as they are essential for our learning and collaboration.

Looking forward to an amazing and productive class!

🔥 Important things

  1. Course Information
  2. Student information
  3. AWS Academy Cloud Foundations
  4. AWS Academy Cloud Architecting
  5. AWS Academy Data Engineering

📚 Course: High Performance Data Processing

Notes

Python practice resources

You can practice by using online Python interpreters or codepads available online. There’s not much difference between an interpreter and a codepad. An interpreter is more interactive than a codepad, but they both let you execute code and see the results.

Below, you’ll find links to some of the most popular online interpreters and codepads. Give them a go to find your favorite.

Python history and current status

Python was released almost 30 years ago and has a rich history. You can read more about it on the History of Python Wikipedia page or in the section on the history of the software from the official Python documentation.

Python has recently been called the fastest growing programming language. If you're interested in why this is and how it’s measured, you can find out more in these articles:

Python: E-book

Title Link
Python for Data Analysis, 3E. By: Wes McKinney
DevFreeBooks

Python: Cheatsheet

Title Link
Python For Data Science: Basic, Jupyter Notebook, NumPy, SciPy - Linear Algebra, Pandas, Scikit-Learn, Matplotlib, Seaborn, Bokeh. By: DataCamp
Pandas
Python Notes/Cheat Sheet. By: @Mark_Graph
Python Cheat Sheet. By: WebsiteSetup.org
Matplotlib Cheatsheets. By: Matplotlib Development Team
Python: 8 Amazing snippet

:octocat: Amazing Github repos for Data Science!

Learn skills or discover useful resources with these repositories.

Title GitHub
Data-scientist-roadmap. By: MrMimic :octocat:
Data Science Resources. By: jb :octocat:
Awesome Data Science By: Fatih Aktürk, Hüseyin Mert & Osman Ungur, Recep Erol. :octocat:
Data Science Interviews: Prepare for your upcoming interview with this repository of questions :octocat:
ML for Beginners by Microsoft: Learn machine learning with Microsoft’s hands-on curriculum :octocat:
Deep Learning Drizzle: Find top universities’ publicly available deep learning classes :octocat:
Awesome Machine Learning: Discover machine learning tools and resources for beginners and advanced practitioners alike :octocat:
500 ML Projects with code :octocat:
All algorithms implemented in Python By: The Algorithms :octocat:
Data Science Best Resources. By: Tirthajyoti Sarkar :octocat:
Learn Python 3 By: Jerry Pussinen :octocat:
Machine Learning Bookcamp By: alexeygrigorev :octocat:

Top 20 Pandas Functions for 80% of Your Data Science Tasks!!!

As Data scientists, we all know how essential it is to have a solid understanding of pandas - Python's go-to library for data manipulation and analysis. Amazing article.

🎯 pd.read_csv()

🎯 df.describe()

🎯 df.info()

🎯 df.plot()

🎯 df.iloc()

🎯 df.loc()

🎯 df.assign()

🎯 df.query()

🎯 df.sort_values()

🎯 df.sample()

🎯 df.isnull()

🎯 df.fillna()

🎯 df.dropna()

🎯 df.drop()

🎯 pd.pivot_table()

🎯 df.groupby()

🎯 df.transpose()

🎯 df.merge()

🎯 df.rename()

🎯 df.to_csv()

Others

Dataset

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%