Master the Toolkit of AI and Machine Learning. Mathematics for Machine Learning and Data Science is a beginner-friendly Specialization where you’ll learn the fundamental mathematics toolkit of machine learning: calculus, linear algebra, statistics, and probability.
Mathematics for Machine Learning and Data Science is a foundational online program created by DeepLearning.AI and taught by Luis Serrano. This beginner-friendly Specialization is where you’ll master the fundamental mathematics toolkit of machine learning.
Many machine learning engineers and data scientists need help with mathematics, and even experienced practitioners can feel held back by a lack of math skills. This Specialization uses innovative pedagogy in mathematics to help you learn quickly and intuitively, with courses that use easy-to-follow plugins and visualizations to help you see how the math behind machine learning actually works.
This is a beginner-friendly program, with a recommended background of at least high school mathematics. We also recommend a basic familiarity with Python, as labs use Python to demonstrate learning objectives in the environment where they’re most applicable to machine learning and data science.
By the end of this Specialization, you will be ready to:
- Represent data as vectors and matrices and identify their properties using concepts of singularity, rank, and linear independence
- Apply common vector and matrix algebra operations like dot product, inverse, and determinants
- Express certain types of matrix operations as linear transformations
- Apply concepts of eigenvalues and eigenvectors to machine learning problems
- Optimize different types of functions commonly used in machine learning
- Perform gradient descent in neural networks with different activation and cost functions
- Describe and quantify the uncertainty inherent in predictions made by machine learning models
- Understand the properties of commonly used probability distributions in machine learning and data science
- Apply common statistical methods like MLE and MAP
- Assess the performance of machine learning models using interval estimates and margin of errors
- Apply concepts of statistical hypothesis testing
After completing this course, learners will be able to:
- Represent data as vectors and matrices and identify their properties using concepts of singularity, rank, and linear independence, etc.
- Apply common vector and matrix algebra operations like dot product, inverse, and determinants
- Express certain types of matrix operations as linear transformations
- Apply concepts of eigenvalues and eigenvectors to machine learning problems
Matrices are commonly used in machine learning and data science to represent data and its transformations. In this week, you will learn how matrices naturally arise from systems of equations and how certain matrix properties can be thought in terms of operations on system of equations.
- Form and graphically interpret 2x2 and 3x3 systems of linear equations
- Determine the number of solutions to a 2x2 and 3x3 system of linear equations
- Distinguish between singular and non-singular systems of equations
- Determine the singularity of 2x2 and 3x3 system of equations by calculating the determinant
- Machine learning motivation
- Systems of sentences
- Systems of equations
- Systems of equations as lines
- A geometric notion of singularity
- Singular vs nonsingular matrices
- Linear dependence and independence
- The determinant
- Practice Quiz: Solving systems of linear equations
- Lab: Introduction to NumPy Arrays
- Systems of equations (3×3)
- Singular vs non-singular (3×3)
- Systems of equations as planes (3×3)
- Linear dependence and independence (3×3)
- The determinant (3×3)
- Quiz: Matrices
- Lab: Solving Linear Systems: 2 variables
In this week, you will learn how to solve a system of linear equations using the elimination method and the row echelon form. You will also learn about an important property of a matrix: the rank. The concept of the rank of a matrix is useful in computer vision for compressing images.
- Solve a system of linear equations using the elimination method.
- Use a matrix to represent a system of linear equations and solve it using matrix row reduction.
- Solve a system of linear equations by calculating the matrix in the row echelon form.
- Calculate the rank of a system of linear equations and use the rank to determine the number of solutions of the system.
- Machine learning motivation
- Solving non-singular systems of linear equations
- Solving singular systems of linear equations
- Solving systems of equations with more variables
- Matrix row-reduction
- Row operations that preserve singularity
- Practice Quiz: Method of Elimination
- Lab: Solving Linear Systems: 3 variables
- The rank of a matrix
- The rank of a matrix in general
- Row echelon form
- Row echelon form in general
- Reduced row echelon form
- Quiz: The Rank of a matrix
- Programming Assignment: System of Linear Equations
An individual instance (observation) of data is typically represented as a vector in machine learning. In this week, you will learn about properties and operations of vectors. You will also learn about linear transformations, matrix inverse, and one of the most important operations on matrices: the matrix multiplication. You will see how matrix multiplication naturally arises from composition of linear transformations. Finally, you will learn how to apply some of the properties of matrices and vectors that you have learned so far to neural networks.
- Perform common operations on vectors like sum, difference, and dot product.
- Multiply matrices and vectors.
- Represent a system of linear equations as a linear transformation on a vector.
- Calculate the inverse of a matrix, if it exists.
- Norm of a vector
- Sum and difference of vectors
- Distance between vectors
- Multiplying a vector by a scalar
- The dot product
- Geometric Dot Product
- Multiplying a matrix by a vector
- Practice Quiz: Vector operations: Sum, difference, multiplication, dot product
- Lab: Vector Operations: Scalar Multiplication, Sum and Dot Product of Vectors
- Matrices as linear transformations
- Linear transformations as matrices
- Matrix multiplication
- The identity matrix
- Matrix inverse
- Which matrices have an inverse?
- Neural networks and matrices
- Quiz: Vector and Matrix Operations, Types of Matrices
- Lab: Matrix Multiplication
- Lab: Linear Transformations
- Programming Assignment: Single Perceptron Neural Networks for Linear Regression
In this final week, you will take a deeper look at determinants. You will learn how determinants can be geometrically interpreted as an area and how to calculate determinant of product and inverse of matrices. We conclude this course with eigenvalues and eigenvectors. Eigenvectors are used in dimensionality reduction in machine learning. You will see how eigenvectors naturally follow from the concept of eigenbases.
- Interpret the determinant of a matrix as an area and calculate determinant of an inverse of a matrix and a product of matrices.
- Determine the bases and span of vectors.
- Find eigenbases for a special type of linear transformations commonly used in machine learning.
- Calculate the eignenvalues and eigenvectors of a linear transformation (matrix).
- Machine Learning Motivation
- Singularity and rank of linear transformation
- Determinant as an area
- Determinant of a product
- Determinants of inverses
- Practice Quiz: Determinants and Linear Transformations
- Bases in Linear Algebra
- Span in Linear Algebra
- Interactive visualization: Linear Span
- Eigenbases
- Eigenvalues and eigenvectors
- Quiz: Eigenvalues and Eigenvectors
- Programming Assignment: Eigenvalues and Eigenvectors
After completing this course, learners will be able to:
- Analytically optimize different types of functions commonly used in machine learning using properties of derivatives and gradients
- Approximately optimize different types of functions commonly used in machine learning using first-order (gradient descent) and second-order (Newton’s method) iterative methods
- Visually interpret differentiation of different types of functions commonly used in machine learning
- Perform gradient descent in neural networks with different activation and cost functions
- Example to motivate derivatives: Speedometer
- Derivative of common functions (c, x, x^2, 1/x)
- Meaning of e and the derivative of e^x
- Derivative of log x
- Existence of derivatives
- Properties of derivative
- Practice Quiz: Derivatives
- Lab: Differentiation in Python: Symbolic, Numerical and Automatic
- Intro to optimization: Temperature example
- Optimizing cost functions in ML: Squared loss
- Optimizing cost functions in ML: Log loss
- Quiz: Derivatives and Optimization
- Programming Assignment: Optimizing Functions of One Variable: Cost Minimization
- Intro to gradients
- Example to motivate gradients: Temperature
- Gradient notation
- Optimization using slope method: Linear regression
- Practice Quiz: Partial Derivatives and Gradient
- Optimization using gradient descent: 1 variable
- Optimization using gradient descent: 2 variable
- Gradient descent for linear regression
- Lab: Optimization Using Gradient Descent in One Variable
- Lab: Optimization Using Gradient Descent in Two Variables
- Quiz: Partial Derivatives and Gradient Descent
- Programming Assignment: Optimization Using Gradient Descent: Linear Regression
- Perceptron with no activation and squared loss (linear regression)
- Perceptron with sigmoid activation and log loss (classification)
- Two-layer neural network with sigmoid activation and log loss
- Mathematics of Backpropagation
- Lab: Regression with Perceptron
- Lab: Classification with Perceptron
- Practice Quiz: Optimization in Neural Networks
- Root finding with Newton’s method
- Adapting Newton’s method for optimization
- Second derivatives and Hessians
- Multivariate Newton’s method
- Lab: Optimization Using Newton's Method
- Quiz: Optimization in Neural Networks and Newton's Method
- Programming Assignment: Neural Network with Two Layers
After completing this course, learners will be able to:
- Analytically optimize different types of functions commonly used in machine learning using properties of derivatives and gradients
- Approximately optimize different types of functions commonly used in machine learning using first-order (gradient descent) and second-order (Newton’s method) iterative methods
- Visually interpret differentiation of different types of functions commonly used in machine learning
- Perform gradient descent in neural networks with different activation and cost functions
- Concept of probability: repeated random trials
- Conditional probability and independence
- Discriminative learning and conditional probability
- Bayes theorem
- Lab: Four Birthday Problems
- Lab: Monty Hall Problem
- Practice Quiz
- Random variables
- Cumulative distribution function
- Discrete random variables: Bernoulli distribution
- Discrete random variables: Binomial distribution
- Probability mass function
- Continuous random variables: Uniform distribution
- Continuous random variables: Gaussian distribution
- Continuous random variables: Chi squared distribution
- Probability distribution function
- Quiz
- Programming Assignment: Probability Distributions / Naive Bayes
- Measures of central tendency: mean, median, mode
- Expected values
- Quantiles and box-plots
- Measures of dispersion: variance, standard deviation
- Practice Quiz
- Joint distributions
- Marginal and conditional distributions
- Independence
- Measures of relatedness: covariance
- Multivariate normal distribution
- Lab: Summary statistics and visualization of data sets
- Quiz
- Lab: Simulating Dice Rolls with Numpy
- Programming Assignment: Loaded Dice
- Population vs. sample
- Describing samples: sample proportion and sample mean
- Distribution of sample mean and proportion: Central Limit Theorem
- Point estimates
- Biased vs Unbiased estimates
- Lab: Sampling data from different distribution and studying the distribution of sample mean
- Practice Quiz
- ML motivation example: Linear Discriminant Analysis
- Likelihood
- Intuition behind maximum likelihood estimation
- MLE: How to get the maximum using calculus
- ML motivation example: Naive Bayes
- Frequentist vs. Bayesian statistics
- A priori/ a posteriori distributions
- Bayesian estimators: posterior mean, posterior median, MAP
- Quiz
- Margin of error
- Interval estimation
- Confidence Interval for mean of population
- CI for parameters in linear regression
- Prediction Interval
- Practice Quiz
- ML Motivation: AB Testing
- Criminal trial
- Two types of errors
- Test for proportion and means
- Two sample inference for difference between groups
- ANOVA
- Power of a test
- Quiz
- Programming Assignment: A/B Testing