- 看下 Gradient-Based Learning Applied to Document Recognition、 ImageNet、AlexNet 的论文
- 七月份目标是主攻卷积神经网络
- 看 SyncBN 的原理
- 看 POD 的源码
- Megatron LM
- 图解 Pipeline 里的气泡,1F1B 和 interleaved
- 看 Megatron 第一篇论文,有助于理解 Transformer
- 围绕着 oneflow, megatron 以及 ZeRO,结合源码(fairscale/layers) 把这些概念吃透。周末搞个 4D 的演示,interleaved pipeline
- 看看 CheckMate: optimal tensor rematerialization (2019)
- Superneurons: Dynamic GPU memory management for training deep neural networks. vdnn (2016)
- 看 DeepSpeed 的实现
- flops,显存,通信各自如何衡量出来?
- data 时间如何计算的?
30 min 10:35-11:05 Deep Residual Learning for Image Recognition, You Only Look Once: Unified, Real-Time Object Detection
9:00 pm - 9:20 pm MegaEgine Dtr
20 min
9:35 am - 9:50 am
1h 9:00 pm - 10:00 pm Reading Memory management in DeepSpeed, MegaEngine and ActNN
20min read AlexNet, VGG, GogoleNet
1h20min update DTR implementation
9:40 pm - 11:10 pm
30 min 11:00 pm - 11:30 pm Pytorch TODO
15 min 9:15-9:30 pm
20min
6:40 pm - 7:00 pm DTR
20min
11:40 pm - 12:00 am Readnig DTR source code
20min
11:40 pm - 12:00 am
10:05 pm - 10:40 pm Reading Tensor Comprehensions, my notes
PCIe, NVLink, GPU Utils
30 min
11:00 pm - 11:30pm
1h 11:00pm-12:00 pm
1h30min
10:30 pm - 12:08 Writing Notes on Pollux
20min 11:40pm-12:00pm Reading Pollux
30 min 11:25 pm - 12:00 Readinig Adaptive DL
45 min, think about how to profile memory, computation, communication and I/O
11:15 pm - 12:00 pm
1h
9:40 pm - 10:40pm Reading DeepSpeed Related Docs and code
11:25 pm - 12:00 Zero / Efficient(Megatron) Communication
20 min 10:45 pm - 11:05 Reading Zero
18 min
11:00 pm - 11:18 Finish Capuchin
15 min
9:00 pm - 9:15 pm Flexible systems are the next frontier of machine learning
11:00 pm - 11:10 Find Codahale metrics implmemented in python
9:00 - 9:30 Capuchin
1h30min Reading Capuchin: Tensor-based GPU Memory Management for Deep Learning 10:00 pm -11:00pm 9:00pm-9:30 pm
9:15 pm - 10:22 pm successfully compiled actnn
2h 30 min
10:25 pm - 11:30 pm # install actnn in sh38
6:00 pm - 6:30 pm 9:45 am - 11:00 am
- actnn practice
- characterizing
3h 30 min 10:25 pm - 10:55 pm 3:10 pm - 6:10 pm Reading Characterizing Deep Learning Training Workloads on Alibaba-PAI
20 min 打印一下 PAI 的那篇论文,看一下杨老师发的一些资料
30min 11:20 pm - 11:50 pm try install actnn
20 min
10:40 pm - 11:00 看 Oneflow 群里的东西
20 min 10:40 pm - 11:00
40 min
7:20 am - 8:00 am. Read Rammer Paper
4h 20min
9:00 pm - 11:50 pm 2:00 pm - 4:00 pm
10:30 am - 11:00 am
2h30min 10:00 pm - 12:30 pm
8:00 pm - 9:00 pm Reading A Study of Persistent Thread Style GPU Programming for GPGPU Workloads
50min 7:20 am - 8:10 am
1h20min
10:00 pm - 11:20 pm
8:45 pm - 9:45 pm. Oneflow zhihu articles 7:45 am - 9:00 am model/transformer.py
1h
samples in dataloader 10:30 pm - 11:00 pm 6:20 pm - 6:50 pm
2h20min 5:20 pm - 6:30 9:30 pm - 10:10 10:40 pm - 11:10 pm
1h40min 4:50 - 6:10 pm draw 3d parallel graph, megatron lm source code 8:10 - 30 am
2:00 pm - 2:30 Find JIT usage, P2P communication in Megatron LM
10:15 pm - 11:30
1h
10:00 pm - 11:00
Read Efficient Large Scale Language Model Training, my Notes
看下 Megatron 里的实现?结合oneflow 的文章,理解为啥难 或看懂 zero 及实现
1h30min
10:30 pm - 12:00
25 min
11:45pm- 12:10 Reading qscheme.py: compute_quantization_bits layers.py: track_running_stats
1h40min
10:00 pm - 11:40pm
Chat with friends about ML
Reading ActNN source code
10:30 pm - 12:00 Reading ActNN source code, train POD, review MR
10h
9:40 pm - 12:00 pm
2:20 pm- 7:00 pm
2h15m
11:20 am -1:35 pm Reading ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
6:35 am - 7:10
Reading PyTorch Autograd docs Reading MegaTron LM: Efficient Large-Scale Language Model Training on GPU Clusters
1h 45 min
10:15 pm -12:00
1h 40 min
7:20-8:00 am fairscale checkpointing with test
9:40-10:40 pm
1h 20min
10:10 - 11:00
7:30-8:00 亚线性内存优化—activation checkpointing在oneflow中的实现 Explore Gradient Checkpointing in PyTorch 结合 fairscale 看看它在显存优化方面的工作
Dynamic Tensor Rematerialization(2020-6-17) MegaEngine 实现了
1h 10min
10:30pm - 11:20 pm
7:20-7:40 Paper: ImageNet Classification with Deep Convolutional Neural Networks, My Notes
4h20min
10:30 am - 12:00: train resnet101, batchsize 1k, epoch 100
13:00 - 13:20 read pod, prototype DistModule, record questions
13:40- (-30) - 4:00 pm read pytorch ddp source code: distributed.py
9:00 pm - 1:14 am draw DDP keynote, write blog
6h
14:20 pm - 20:00 pm
动画详解 Pytorch Data Distributed Parallel
2h
10:30 pm 12:00 pm
7:15 am - 7:40 am
- pytorch 里ddp 如何具体实现?
- pytorch tunning performance guide
- 如何画出动图:DDP
2.5 h
10:00 pm - 12:09 pm [Read pytorch ddp paper](./papers/papers/PyTorch Distributed-Data Parallel Training.md) and draw illustration of ddp
7:30 - 8:00 am Read pytorch dataloder source code
2 h
11:05- 12:00 pm multiprocessing in pytorch
6:40-7:50 am Read dataloader useage in prototype and pod
2h 40min
6:40 - 7: 20
10:20 - 12:30 : pytorch dataloader
2h
6:10 - 7:20 am Reading GShard
10:20 - 12:11 pm
[cs 294-ai-sys: lecture 1](./courses/ucberkely-cs294-ai-sys l1.md)
3.5h
7:55-8:30 am
8:30-9:00 pm
9:30 - 11:00 pm
- Read GSPMD
- Read GShard
2h
8:30 pm - 9:30 pm
11:00 pm - 12:12
-
Read CSE 599W: Systems for ML, Lecture1(Distributed Training and Communication Protocols), Lecture 5(GPU)
-
LaunchPad