Note of UC Berkeley CS267
Official Website: AI-Sys Sp22 (ucbrise.github.io)
- Lecture 1: Introduction & Overview
- Lecture 2: Memory Hierarchies and Matrix Multiplication
- Lecture 3: More MatMul and the Roofline Performance Model
- Lecture 3: Shared Memory Parallelism
- Lecture 5: Sources of Parallelism and Locality (Part 1)
- Lecture 6: Sources of Parallelism and Locality (Part 2)
- Lecture 6: Communication-avoiding matrix multiplication
- Lecture 7: An Introduction to CUDA and Graphics Processors (GPUs)
- Lecture 8: Data Parallel Algorithms (aka, tricks with trees)
- Lecture 9: Distributed Memory Machines and Programming
- Lecture 10: Advanced MPI and Collective Communication Algorithms ->AIsys 4,5
- Lecture 11: UPC++: Partitioned Global Address Space Languages
- Lecture 12a: Parallel Algorithms for De Novo Genome Assembly
- Lecture 12b: Communication-Avoiding Graph Neural Networks
- Lecture 12c: Distributed Computing with Ray and NumS
- Lecture 13: Parallel Matrix Multiply
- Lecture 14: Dense Linear Algebra
- Lecture 15: Structured Grids
- Lecture 16: Machine Learning Part 1 (Supervised Learning)
- Lecture 17: Machine Learning Part 2 (Unsupervised and semi-supervised learning)
- Lecture 18: Sparse-Matrix-Vector-Multiplication and Iterative Solvers
- Lecture 19: Fast Fourier Transform
- Lecture 20: Graph Algorithms
- Lecture 21: Cloud Computing and HPC
- Lecture 22a: Graph Partitioning
- Lecture 22b: Load Balancing with Work Stealing
- Lecture 23: Hierarchical Methods for the N-Body Problem
- Lecture 24: Sorting and Searching
- Lecture 25: Big Bang, Big Data, Big Iron
- Lecture 26: Computational Biology