Skip to content

Commit 2655d7a

Browse files
tlrmchlsmthyewentao256
authored andcommitted
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
1 parent 91d4299 commit 2655d7a

File tree

1 file changed

+0
-8
lines changed

1 file changed

+0
-8
lines changed

vllm/v1/worker/gpu_worker.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -155,14 +155,6 @@ def initialize_cache(self, num_gpu_blocks: int,
155155

156156
def init_device(self):
157157
if self.device_config.device.type == "cuda":
158-
# torch.distributed.all_reduce does not free the input tensor until
159-
# the synchronization point. This causes the memory usage to grow
160-
# as the number of all_reduce calls increases. This env var disables
161-
# this behavior.
162-
# Related issue:
163-
# https://discuss.pytorch.org/t/cuda-allocation-lifetime-for-inputs-to-distributed-all-reduce/191573
164-
os.environ["TORCH_NCCL_AVOID_RECORD_STREAMS"] = "1"
165-
166158
# This env var set by Ray causes exceptions with graph building.
167159
os.environ.pop("NCCL_ASYNC_ERROR_HANDLING", None)
168160
self.device = torch.device(f"cuda:{self.local_rank}")

0 commit comments

Comments
 (0)