From 2de21693eedb63a0144d921e5000bcf72076abf1 Mon Sep 17 00:00:00 2001 From: Jim Burtoft <39492751+jimburtoft@users.noreply.github.com> Date: Mon, 16 Dec 2024 14:12:44 -0500 Subject: [PATCH] Dead link in training-troubleshooting.rst Added link to a specific version to match https://github.com/aws-neuron/aws-neuron-sdk/blob/e6832861f084a9e71532a4c7b8d6cfa5e9be5b0c/frameworks/torch/torch-neuronx/api-reference-guide/training/torch-neuron-envvars.rst#L157 and https://github.com/aws-neuron/aws-neuron-sdk/blob/e6832861f084a9e71532a4c7b8d6cfa5e9be5b0c/conf.py#L300 --- frameworks/torch/torch-neuronx/training-troubleshooting.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/frameworks/torch/torch-neuronx/training-troubleshooting.rst b/frameworks/torch/torch-neuronx/training-troubleshooting.rst index edf6efca..289dc0c4 100644 --- a/frameworks/torch/torch-neuronx/training-troubleshooting.rst +++ b/frameworks/torch/torch-neuronx/training-troubleshooting.rst @@ -20,7 +20,7 @@ For setting up EFA that is needed for multi-node training, please see :ref:`setu For XLA-related troubleshooting notes see :ref:`How to debug models in PyTorch Neuron ` and `PyTorch-XLA troubleshooting -guide `__. +guide `__. If your multi-worker training run is interrupted, you may need to kill all the python processes (WARNING: this kills all python processes and