diff --git a/README.md b/README.md index 60013df93..116722aa6 100644 --- a/README.md +++ b/README.md @@ -808,6 +808,8 @@ CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ... ## Common Errors 🧰 +See also the [FAQ's](./docs/faq.md). + > If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it: Please reduce any below diff --git a/docs/faq.md b/docs/faq.md new file mode 100644 index 000000000..e5b729e26 --- /dev/null +++ b/docs/faq.md @@ -0,0 +1,14 @@ +# Axolotl FAQ's + + +> The trainer stopped and hasn't progressed in several minutes. + +Usually an issue with the GPU's communicating with each other. See the [NCCL doc](../docs/nccl.md) + +> Exitcode -9 + +This usually happens when you run out of system RAM. + +> Exitcode -7 while using deepspeed + +Try upgrading deepspeed w: `pip install -U deepspeed`