🎓 The Modern Machine Learning Engineer

🤗 Welcome!

This is a course which teaches you how to build useful ML software.

This course aims wide. We will cover all topics required to be a successful ML engineer. This includes data engineering, data exploration, data analysis, and ML engineering, and frameworks like numpy, pandas, sklearn, keras, and pytorch (for a full list check the syllabus and the learning goals section).

This course is lean. You will learn just enough to build AI solutions from scratch. ML is a vast subject however, and resources are provided for deeper dives into any of its fields.

This course is pragmatic. All lectures consist of slides explaining key theoretical concepts, followed by a hands-on python notebook with coding exercises.

✅ What this course will do:

teach you all the theory and skills required to create effective, scalable, robust, ethical machine learning systems
give you hands-on experience training, improving, and deploying ML models
make you employable in the AI industry
turn you into a jack of all trades
guide you to become a master of few
share some low quality AI memes

❌ What this course won't do:

give you 10 years of experience in ML
turn you into a Deep Learning wizard
make you publish a paper at NeurIPS
build Skynet

🤔 Why this course?

What once was a niche research topic has blossomed into a mature engineering field. There are excellent online degrees that focus on ML theory, and great practical courses that cover frameworks. But the barrier remains high to start a career in AI, as none of these teach you how to build ML products from scratch. This course's purpose is to enable all coders to get out there and solve the world's problems one neural network at a time.

👽 Who is this for?

This course is perfect for the software engineer curious about ML technologies, the data scientist looking to move past Kaggle kernels, or the AI enthusiast wondering how systems are built in the real world.

Beginner programming skills are required. A little python experience (can you define a function?), and some statistical basics (what's a standard deviation?) are recommended.

👨‍🏫 Who is this by?

Camille Van Hoffelen has worked as an ML engineer for the past 6 years, with a focus on large scale Natural Language Processing systems. This course was taught at Ilia State University, Georgia, in 2020.

🚀 How should I use this course?

Here's a few ideas:

For each lecture, read the slides, then go through the notebooks. Complete the 💪 and 🧠 exercises, then flip through the additional resources.
Skip straight to a particular section/lecture if you have already taken 50 billion ML courses.
Forget about the slides, find the notebook for that method you can't remember how to use, and copy paste to your heart's content.
Test yourself with the assignments. Become the nerdiest Pokemon trainer there ever was.

Notebooks can be viewed in github, viewed in nbviewer, run locally with jupyter, run with mybinder, or run with google colab.

Syllabus

Introduction

Workstation Setup
slides notebook

Chapter 1: Data Engineering

1.1 Fundamentals of Data Engineering & Data Lakes
slides notebook
1.2 Data Pipelines
slides notebook
1.3 Data Warehouses
slides notebook
1.4 Effective Data Storage
slides notebook

Chapter 2: Data Exploration

2.1 Introduction to Numpy & Pandas
slides notebook
2.2 Tabular Data Pt.1
notebook
2.3 Tabular Data Pt.2
notebook
2.4 Time Series Data
slides notebook
2.5 Text & Image Data
notebook
2.6 Data Visualization
slides notebook

Chapter 3: Data Analysis

3.1 Introduction to Machine Learning & Clustering
slides notebook
3.2 Dimensionality Reduction
slides notebook
3.3 Anomaly Detection
slides notebook
3.4 Supervised Learning Fundamentals
slides notebook
3.5 Linear Regression
slides notebook
3.6 Logistic Regression
slides notebook
3.7 Learning Better Pt.1
slides notebook
3.8 Learning Better Pt.2
slides notebook
3.9 Support Vector Machines
slides notebook
3.10 Random Forests
slides notebook
3.11 Neural Networks Pt.1
slides notebook
3.12 Neural Networks Pt.2
slides notebook
3.13 Learning Better Pt.3
slides notebook
3.14 Introduction to Pytorch
slides notebook
3.15 Computer Vision Pt.1
slides notebook
3.16 Computer Vision Pt.2
slides notebook
3.17 Natural Language Processing Pt.1
3.18 Natural Language Processing Pt.2
3.19 ML Research

Chapter 4: ML Engineering

4.1 Fundamentals of ML Engineering
slides notebook
4.2 Evaluation Pt.1
slides notebook
4.3 Evaluation Pt.2
slides notebook
4.4 Error Analysis
4.5 Hyperparameter Optimization
4.6 ML Challenges
4.7 ML Labs
4.8 Datasets & Labeling
4.9 ML Architecture
4.10 ML Deployment
4.11 Data Privacy & Security
4.12 ML Ethics

Assignments

Learning Goals

This course is designed to step through the journey of building a production data application, as reflected in the following learning goals:

Storing and Managing Data
Students can build data lakes and data repositories, as well as data pipelines to automate the gathering and processing of data.
Exploring and Analysing Data
Students can describe, visualise, and provide insights into tabular, time series, image, text, and geospatial data.
Building a Machine Learning Product
Students can build useful classification, regression, and clustering tools using Machine Learning methods.
Being Ethical and Secure
Students understand the importance of privacy in data driven methods, and the responsibility of engineers concerning the impact of these technologies.

Assignments

The coursework is split between four small assignments, a final project, and a final presentation.

The small assignments serve to synthesise the previous course content on your own, and put it to practice. They are all coding exercises: you will be given resources and/or code stubs, and will submit runnable code and some observations (see assignments readme).

The final project tests everything that you have learnt from this course. This is a python notebook report like those data scientists make to share their experimental progress. It tests your ability to design, carry out, and communicate machine learning experiments. This is complemented by the final presentation, a 15mn talk to synthesize, and discuss the results. (see final project readme).

License

This work is licensed under a Creative Commons Attribution 4.0 International License. See the LICENSE.txt file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
assignments		assignments
data_analysis		data_analysis
data_engineering		data_engineering
data_exploration		data_exploration
introduction		introduction
ml_engineering		ml_engineering
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Pipfile		Pipfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 The Modern Machine Learning Engineer

🤗 Welcome!

✅ What this course will do:

❌ What this course won't do:

🤔 Why this course?

👽 Who is this for?

👨‍🏫 Who is this by?

🚀 How should I use this course?

Syllabus

Introduction

Chapter 1: Data Engineering

Chapter 2: Data Exploration

Chapter 3: Data Analysis

Chapter 4: ML Engineering

Assignments

Learning Goals

Assignments

License

About

Releases

Packages

Languages

License

camille-vanhoffelen/modern-ML-engineer

Folders and files

Latest commit

History

Repository files navigation

🎓 The Modern Machine Learning Engineer

🤗 Welcome!

✅ What this course will do:

❌ What this course won't do:

🤔 Why this course?

👽 Who is this for?

👨‍🏫 Who is this by?

🚀 How should I use this course?

Syllabus

Introduction

Chapter 1: Data Engineering

Chapter 2: Data Exploration

Chapter 3: Data Analysis

Chapter 4: ML Engineering

Assignments

Learning Goals

Assignments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages