samyam's estimation notes

EIFY · Jun 17, 2021 · c2d4d9d · c2d4d9d
1 parent cc009e8
commit c2d4d9d
Show file tree

Hide file tree

Showing 2 changed files with 39 additions and 0 deletions.
diff --git a/finetune/README.md b/finetune/README.md
@@ -0,0 +1,26 @@
+# Finetuning
+
+Notes on the plans to do finetuning with the pre-trained model
+
+# Large Model on smaller hardware setup
+
+- fine-tuning a 150-200B model with fewer GPUs than the pre-training setup
+
+## a. Fine-Tuning requiring only the model weights from the pre-training and uninitialized optimizer states
+
+
+Solution: This can also be done using ZeRO-Infinity
+
+Hardware Requirements: This would require about 2.5-5 TB of aggregate memory for 100-200B model. It can be either CPU memory or NVMe memory, and it can be within a single node or across nodes. A single node server with enough CPU or NVMe can work, if speed is not an issue.
+
+Estimated Work: We can do this with ZeRO-Infinity. Seems like @Shaden Smith already has the code to load the model parameters checkpoints from Megatron+DeepSpeed 3D to Megatron+ DeepSpeed ZeRO-Infinity.
+
+## b. Continued-Training requiring both the model weights and optimizer states after pre-training
+
+Solution: This can be done using Megatron+DeepSpeed 3D with ZeRO CPU Offload.
+
+Hardware Requirements: This option will require 2-4 TB of aggregate CPU memory to store the optimizer states and 600-1200GB of aggregate GPU memory to store parameters, gradients and activations for 100-200B model.
+
+This reduces the number of GPUs required by 4x. Will run on 32-64 GPUs on 4-8x nodes with 8xV100, 768GB RAM.
+
+Estimated work: The current code already supports it.
diff --git a/inference/README.md b/inference/README.md
@@ -0,0 +1,13 @@
+# Inference
+
+Notes on the plans to do inference with the pre-trained model
+
+# Large Model on limited hardware
+
+- inferencing and tinkering on a single host (150-200B model)
+
+Solution: We can do this with ZeRO-Infinity. Seems like @Shaden Smith already has the code to load the model parameters checkpoints from Megatron+DeepSpeed 3D to Megatron+ DeepSpeed ZeRO-Infinity. The remaining work is to add an inference only mode to ZeRO-Infinity that drops all the non-parameter states.
+
+Hardware Requirements : Would require about 500-1000 GB of memory (can be CPU, GPU or NVMe). Single Node with enough CPU or NVMe memory should work here.
+
+Estimated Work: If all works as expected, 1-3 weeks based on bandwidth availability. Tuning for the best performance might another week or so, but that wont be blocking the availability of the functionality.