From 842b7013da5446804b0f50ef4e9bd5586449951d Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Wed, 14 Apr 2021 11:23:49 -0700 Subject: [PATCH] document how to override ~/.cache/torch_extensions --- docs/_tutorials/advanced-install.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/_tutorials/advanced-install.md b/docs/_tutorials/advanced-install.md index 4ff02bf7ec84..3b9449d368fc 100644 --- a/docs/_tutorials/advanced-install.md +++ b/docs/_tutorials/advanced-install.md @@ -103,6 +103,23 @@ script in the repo. This will build a python wheel locally and copy it to all the nodes listed in your hostfile (either given via --hostfile, or defaults to /job/hostfile). +When the code using DeepSpeed is used for the first time it'll automatically build only the CUDA +extensions, required for the run, and by default it'll place them under +`~/.cache/torch_extensions/`. The next time the same program is executed these now precompiled +extensions will be loaded form that directory. + +If you use multiple virtual environments this could be a problem, since by default there is only one +extensions directory, but different virtual environments may use different setups (e.g. different +python or cuda versions) and then the loading of a CUDA extension built by another environment will +fail. Therefore, if you need to you can override the default location with the help of the + `TORCH_EXTENSIONS_DIR` environment variable. So in each virtual environment you can point it to a + unique directory and DeepSpeed will use it to save and load CUDA extensions. + + You can also change it just for a specific run with: + +```bash + TORCH_EXTENSIONS_DIR=./torch-extensions deepspeed ... +``` ## Building for the correct architectures