Don't forget to hit the ⭐ if you like this repo.
This course presents to the students recent research and industrial issues pertaining to data engineering, database systems and technologies. Various topics of interests that are directly or indirectly affecting or are being influenced by data engineering, database systems and technologies are explored and discussed. Participation in forums as well as face to face interaction, with researchers and practitioners on these topics are encouraged. Students should then be able to conduct their own investigation and deductions. This course will also expose students to industry’s experiences in managing database systems and technologies through sharing knowledge sessions and work based learning activities with selected organization.
No | Module | Description | Notes |
---|---|---|---|
1 | Data Engineer, Data Engineering, Data Science, Data Scientist | Data Engineer, Data Engineering, Data Science, and Data Scientist are all related to handling and processing large amounts of data. Data engineering and data science are both parts of the data lifecycle, where data engineering focuses on building and maintaining the data infrastructure, while data science focuses on extracting insights from the data using various techniques. They all require a strong understanding of various tools and technologies used in data processing and analysis, such as SQL, Python, Hadoop, Spark, and cloud computing. They also require a solid understanding of data structures, algorithms, and programming concepts to perform their work effectively. | |
2 | Application Programming Interface (API) | An API, or Application Programming Interface, is a set of protocols and tools for building software applications. In data science, APIs are often used to access and integrate data from external sources into data analysis workflows. APIs enable developers and data scientists to retrieve data in a structured way, typically in JSON or XML format, and to perform data analysis tasks programmatically. Some popular APIs for data science include the Twitter API, Google Maps API, and Spotify API. These APIs provide access to a wide range of data, including social media data, geographic data, and music data, and can be used to extract insights and build predictive models. API tools such as Postman, Swagger, and Insomnia can be used to test, document, and automate API requests, and to build more complex workflows using multiple APIs. | |
3 | Data Scraping | Data scraping, also known as web scraping, is the process of extracting data from websites using automated software programs. It involves writing code that sends automated requests to a website, parses the HTML or XML content, and extracts the desired information. Data scraping tools automate this process and can be used to collect data for research, analysis, or business intelligence purposes. Some popular data scraping tools include Beautiful Soup, Scrapy, Octoparse, Parsehub, and WebHarvy. These tools provide a range of features and capabilities, such as the ability to extract data from different types of web pages, the ability to handle complex data structures, and the ability to schedule and automate scraping tasks. However, it is important to ensure that data scraping is done in compliance with applicable laws and regulations, and with respect for the privacy of individuals. | |
4 | Data Integration | Data integration in data science is the process of combining data from multiple sources into a unified view for analysis. This involves identifying relevant data sources, transforming and cleansing the data to ensure consistency and quality, and integrating the data into a common format. Data integration tools automate the process of data integration and enable organizations to manage the entire data integration process, including data mapping, data transformation, and data quality. These tools can be categorized into three types: ETL (Extract, Transform, Load), ELT (Extract, Load, Transform), and EAI (Enterprise Application Integration). Popular data integration tools include Apache Kafka, Apache NiFi, Talend, Informatica PowerCenter, and Microsoft SQL Server Integration Services. These tools help organizations to streamline their data integration process, enabling data scientists to analyze data from multiple sources, and gain a comprehensive understanding of a particular phenomenon or problem. | |
5 | Types of Data & NoSQL Database | Data can be broadly categorized into two types: structured and unstructured data. Structured data is organized and formatted in a specific way that makes it easy to store and analyze using traditional relational database management systems (RDBMS). Unstructured data, on the other hand, refers to data that does not have a predefined structure, such as text, images, videos, and audio files. NoSQL databases are a type of non-relational database that are designed to handle unstructured or semi-structured data. There are several types of NoSQL databases, including document-oriented, key-value, column-family, and graph databases. Document-oriented databases such as MongoDB store data in flexible JSON-like documents, while key-value databases such as Redis store data as key-value pairs. Column-family databases such as Apache Cassandra store data in column families, and graph databases such as Neo4j are designed to handle highly connected data such as social networks. Each type of NoSQL database has its own strengths and weaknesses and is suited to different use cases depending on the nature of the data and the requirements of the application. | |
6 | Data Wrangling | Data wrangling, also known as data cleaning or data preprocessing, is the process of cleaning, transforming, and preparing raw data for analysis. This involves identifying and addressing issues such as missing or inconsistent data, formatting errors, and duplicates. Data wrangling tools automate this process and can be used to streamline data cleaning and preparation tasks. Some popular data wrangling tools include OpenRefine, Trifacta, DataWrangler, KNIME, and Talend. These tools provide a range of features and capabilities, such as the ability to handle large datasets, automate data cleaning tasks, and visualize data for exploration and analysis. Data wrangling is an essential step in the data analysis process, as it helps to ensure that the data is accurate, consistent, and relevant for analysis. | |
7 | Feature Engineering | Feature engineering is the process of selecting, creating, and transforming variables (or features) in a dataset to improve the performance of a machine learning model. This involves identifying relevant variables, transforming variables to make them more useful, and creating new variables that capture important information. Feature engineering tools automate this process and can be used to streamline feature selection and creation tasks. Some popular feature engineering tools include Featuretools, tpot, AutoML, and H2O.ai. These tools provide a range of features and capabilities, such as the ability to automate feature selection and creation, identify important variables, and optimize feature pipelines for machine learning models. Feature engineering is an important step in the machine learning process, as it helps to ensure that the model is able to learn from relevant data and make accurate predictions. | |
8 | Artificial Intelligence vs Machine Learning vs Deep Learning | Artificial Intelligence, Machine Learning, and Deep Learning are all related to the field of computer science and are focused on enabling computers to learn and make decisions based on data. Artificial intelligence involves building systems that can perform tasks that typically require human intelligence, such as language understanding, decision making, and problem-solving. Machine learning is a subset of AI that focuses on building algorithms that can learn patterns and make decisions based on data without being explicitly programmed. Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to learn and identify patterns in data. Overall, all three fields involve leveraging data to build intelligent systems that can learn from experience and make decisions based on that learning. | |
9 | Visualization | RData visualization is the process of representing data graphically to help people understand and make sense of complex data. Visualization tools in data science allow users to create visual representations of data, such as charts, graphs, and maps, that can be easily interpreted and analyzed. Some popular data visualization tools include Tableau, Power BI, Google Data Studio, and D3.js. These tools provide a range of features and capabilities, such as the ability to create interactive dashboards, explore data in real-time, and collaborate with others on visualizations. Data visualization is an important part of the data analysis process, as it helps to uncover patterns, trends, and insights in the data that might not be apparent from raw data alone. |
- Visual Studio Code Web Dev Setup In 6 Minutes
- Visual Studio Code: HTML, CSS & JS Tips
- Top 10 Best VS Code Extensions
- How to Setup Live Server in VS Code?
- How to setup and use GitHub with Visual Studio Code - (2023)
- How to Host a Website on GitHub [2023] | GitHub Pages Site
- Download: Visual Studio Code
- VS Code: Getting Started
- Collaborate with Live Share
- Install and sign in to Live Share in Visual Studio Code
- Marketplace: Live Share Extension Pack
- VS Code Live SHARE | How to use LiveShare in VS code for live online collaboration [ Quick Guide ]
- GitHub Desktop
- Udemy: Beginner VS Code
- Learn MongoDB
- Wikipedia: MongoDB
- Web Application using Mongodb
- MongoDB Atlas
- Sign in MongoDB Atlas
- MongoDB Compass
- Download: MongoDB Compass
- Connect MongoDB Atlas With MongoDB Compass
- w3schools: MongoDB Tutorial
- MongoDB - Quick Guide
- Github: mongodb
- Youtube: Complete MongoDB
- Youtube: Belajar MongoDB
- Data science project using MongoDB
- How to Install MongoDB && Compass | MacOS
- MongoDB Sample Dataset
- Python Dash Web Application Connected to Live Database - MongoDB
- Web Data Dashboard with Plotly express and Flask Python and JavaScript
- Flask Course - Python Web Application Development
- Jupyter Notebook: How to use it to create a web application using Django and MongoDb
- Visual Studio Code: develop web application using MongoDb and Django
- Python Web Development Libraries - Quick Guide
- Python Web Framework — A Detailed List of Web Frameworks in Python
- 13 Project Ideas for Intermediate Python Developers
- Top 11 Python Frameworks for Web Development In 2023
- Github: Real Python - materials
- PyScript - GitHub
- Run Python Visualizations on the Web Using PyScript
- Run Python in Your HTML
- PyScript demos
- Pyscript Tutorial With Simple Code Examples
- How to Run Python Visualizations on a Web Browser using PyScript
- A First Look at PyScript: Python in the Web Browser
- How to Embed Interactive Python Visualizations on Your Website with Python and Matplotlib
- How to Easily Run Python Visualizations On a Web Browser with PyScript
- Meet Django
- Github: django
- Django Tutorial: w3schools
- A Practical Introduction to Web Scraping in Python
- Creating and Viewing HTML files with Python
- Running Django on Google Colab
- Django tutorial
- 9 Best Django Website Templates 2023
- College Management System using Django – Python Project
- Best Python Django Tutorial For Beginners – With Project Structure
- Django Dashboards — Open Source and Free
- Python TurboGears: The Web Framework that scales with you
- The TurboGears Documentation
- Wikipedia: TurboGears
- Github: TurboGears
- Turbogears - Quick Guide
- web2py Web Framework
- web2py Examples
- Book: Complete Reference Manual, 6th Edition (pre-release). written by Massimo Di Pierro in English
- Wikipedia: web2py
- Github:web2py
- Lab 1: Hello World Program
- Lab 2: Pyscript With the src Attribute
- Lab 3: Working With Python Environment
- Lab 4: Interactive Embedded Shell
- Lab 5: Rendering Bokeh Plot With Pyscript
- Lab 6: Matplotlib
Please create an Issue for any improvements, suggestions or errors in the content.
You can also contact me using Linkedin for any other queries or feedback.