Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

Commit

Permalink
Update distributed docs to torchrun
Browse files Browse the repository at this point in the history
  • Loading branch information
EricMarcus-ai committed Feb 16, 2024
1 parent 001614e commit b5d8829
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Since Ahcore is based on Lightning, we can change the configuration for the Trai
strategy: ddp
precision: 32
which will execute on 2 GPUs (devices=2) on 1 node using Distributed Data Parallel (DDP). Launching from the command line can be done using, e.g., torch distributed launch:
which will execute on 2 GPUs (devices=2) on 1 node using Distributed Data Parallel (DDP). Launching from the command line can be done using, e.g., torchrun:

.. code-block:: bash
python -m torch.distributed.launch --nproc_per_node=2 --use_env /.../ahcore/tools/train.py data_description=something lit_module=your_module trainer=default_ddp
torchrun --nproc_per_node=2 /.../ahcore/tools/train.py data_description=something lit_module=your_module trainer=default_ddp
Note that a simple command without a distributed launch might only detect 1 GPU!

Expand Down

0 comments on commit b5d8829

Please sign in to comment.