Skip to content

Commit

Permalink
samyam's estimation notes
Browse files Browse the repository at this point in the history
  • Loading branch information
stas00 committed Jun 17, 2021
1 parent cc009e8 commit c2d4d9d
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 0 deletions.
26 changes: 26 additions & 0 deletions finetune/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Finetuning

Notes on the plans to do finetuning with the pre-trained model

# Large Model on smaller hardware setup

- fine-tuning a 150-200B model with fewer GPUs than the pre-training setup

## a. Fine-Tuning requiring only the model weights from the pre-training and uninitialized optimizer states


Solution: This can also be done using ZeRO-Infinity

Hardware Requirements: This would require about 2.5-5 TB of aggregate memory for 100-200B model. It can be either CPU memory or NVMe memory, and it can be within a single node or across nodes. A single node server with enough CPU or NVMe can work, if speed is not an issue.

Estimated Work: We can do this with ZeRO-Infinity. Seems like @Shaden Smith already has the code to load the model parameters checkpoints from Megatron+DeepSpeed 3D to Megatron+ DeepSpeed ZeRO-Infinity.

## b. Continued-Training requiring both the model weights and optimizer states after pre-training

Solution: This can be done using Megatron+DeepSpeed 3D with ZeRO CPU Offload.

Hardware Requirements: This option will require 2-4 TB of aggregate CPU memory to store the optimizer states and 600-1200GB of aggregate GPU memory to store parameters, gradients and activations for 100-200B model.

This reduces the number of GPUs required by 4x. Will run on 32-64 GPUs on 4-8x nodes with 8xV100, 768GB RAM.

Estimated work: The current code already supports it.
13 changes: 13 additions & 0 deletions inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Inference

Notes on the plans to do inference with the pre-trained model

# Large Model on limited hardware

- inferencing and tinkering on a single host (150-200B model)

Solution: We can do this with ZeRO-Infinity. Seems like @Shaden Smith already has the code to load the model parameters checkpoints from Megatron+DeepSpeed 3D to Megatron+ DeepSpeed ZeRO-Infinity. The remaining work is to add an inference only mode to ZeRO-Infinity that drops all the non-parameter states.

Hardware Requirements : Would require about 500-1000 GB of memory (can be CPU, GPU or NVMe). Single Node with enough CPU or NVMe memory should work here.

Estimated Work: If all works as expected, 1-3 weeks based on bandwidth availability. Tuning for the best performance might another week or so, but that wont be blocking the availability of the functionality.

0 comments on commit c2d4d9d

Please sign in to comment.