[executor] init `local_rank` as device index #13027

MengqingCao · 2025-02-10T09:30:38Z

What does this pr do

This pr initailizing local_rank by the rank specified in argument device if it exists.
FIX #12967

Bug description

While specifying device to cards except card-0, as following code, there'll cause an device conflict because the tensors (such as attn_bias) will be put on card-0 by default.

from vllm import LLM
llm = LLM("facebook/opt-125m", device="cuda:1")

before this pr

INFO 02-10 17:10:10 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 17:10:17 config.py:542] This model supports multiple tasks: {'embed', 'score', 'classify', 'generate', 'reward'}. Defaulting to 'generate'.
WARNING 02-10 17:10:17 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
WARNING 02-10 17:10:17 config.py:678] Async output processing is not supported on the current platform type cuda.
INFO 02-10 17:10:17 llm_engine.py:234] Initializing a V0 LLM engine (v0.6.4.post2.dev395+g02222a02.d20241217) with config: model='/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6', speculative_config=None, tokenizer='/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda:1, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False, 
INFO 02-10 17:10:18 cuda.py:179] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-10 17:10:18 cuda.py:227] Using XFormers backend.
INFO 02-10 17:10:19 model_runner.py:1109] Starting to load model /home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6...
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.71it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.71it/s]

INFO 02-10 17:10:20 model_runner.py:1114] Loading model weights took 0.0000 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/cmq/code/vllm/pipeline.py", line 48, in <module>
[rank0]:     llm = LLM("/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6",
[rank0]:   File "/home/cmq/code/vllm/vllm/utils.py", line 1051, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/entrypoints/llm.py", line 242, in __init__
[rank0]:     self.llm_engine = self.engine_class.from_engine_args(
[rank0]:   File "/home/cmq/code/vllm/vllm/engine/llm_engine.py", line 484, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/home/cmq/code/vllm/vllm/engine/llm_engine.py", line 276, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/home/cmq/code/vllm/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:   File "/home/cmq/code/vllm/vllm/executor/executor_base.py", line 101, in determine_num_available_blocks
[rank0]:     results = self.collective_rpc("determine_num_available_blocks")
[rank0]:   File "/home/cmq/code/vllm/vllm/executor/uniproc_executor.py", line 55, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/utils.py", line 2220, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/worker/worker.py", line 229, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/worker/model_runner.py", line 1234, in profile_run
[rank0]:     self._dummy_run(max_num_batched_tokens, max_num_seqs)
[rank0]:   File "/home/cmq/code/vllm/vllm/worker/model_runner.py", line 1345, in _dummy_run
[rank0]:     self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/worker/model_runner.py", line 1718, in execute_model
[rank0]:     hidden_or_intermediate_states = model_executable(
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/model_executor/models/opt.py", line 370, in forward
[rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]:   File "/home/cmq/code/vllm/vllm/compilation/decorators.py", line 172, in __call__
[rank0]:     return self.forward(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/model_executor/models/opt.py", line 325, in forward
[rank0]:     return self.decoder(input_ids,
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/model_executor/models/opt.py", line 282, in forward
[rank0]:     hidden_states = layer(hidden_states,
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/model_executor/models/opt.py", line 175, in forward
[rank0]:     hidden_states = self.self_attn(hidden_states=hidden_states,
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/model_executor/models/opt.py", line 115, in forward
[rank0]:     attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/home/cmq/code/vllm/vllm/attention/layer.py", line 201, in forward
[rank0]:     return torch.ops.vllm.unified_attention(
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
[rank0]:     return self._op(*args, **(kwargs or {}))
[rank0]:   File "/home/cmq/code/vllm/vllm/attention/layer.py", line 307, in unified_attention
[rank0]:     return self.impl.forward(self, query, key, value, kv_cache, attn_metadata)
[rank0]:   File "/home/cmq/code/vllm/vllm/attention/backends/xformers.py", line 558, in forward
[rank0]:     out = self._run_memory_efficient_xformers_forward(
[rank0]:   File "/home/cmq/code/vllm/vllm/attention/backends/xformers.py", line 730, in _run_memory_efficient_xformers_forward
[rank0]:     out = xops.memory_efficient_attention_forward(
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 376, in memory_efficient_attention_forward
[rank0]:     return _memory_efficient_attention_forward(
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 483, in _memory_efficient_attention_forward
[rank0]:     inp.validate_inputs()
[rank0]:   File "/home/cmq/miniconda3/envs/vllm/lib/python3.10/site-packages/xformers/ops/fmha/common.py", line 145, in validate_inputs
[rank0]:     raise ValueError(
[rank0]: ValueError: Attention bias and Query/Key/Value should be on the same device
[rank0]:   query.device: cuda:1
[rank0]:   attn_bias   : cuda:0

[rank0]:[W210 17:10:21.389456654 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

after this pr

INFO 02-10 17:12:24 __init__.py:190] Automatically detected platform cuda.
INFO 02-10 17:12:31 config.py:542] This model supports multiple tasks: {'score', 'generate', 'classify', 'embed', 'reward'}. Defaulting to 'generate'.
WARNING 02-10 17:12:31 cuda.py:95] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
WARNING 02-10 17:12:31 config.py:678] Async output processing is not supported on the current platform type cuda.
INFO 02-10 17:12:31 llm_engine.py:234] Initializing a V0 LLM engine (v0.6.4.post2.dev395+g02222a02.d20241217) with config: model='/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6', speculative_config=None, tokenizer='/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda:1, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False, 
INFO 02-10 17:12:32 cuda.py:179] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-10 17:12:32 cuda.py:227] Using XFormers backend.
INFO 02-10 17:12:32 model_runner.py:1109] Starting to load model /home/cmq/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6...
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.86it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.86it/s]

INFO 02-10 17:12:33 model_runner.py:1114] Loading model weights took 0.2389 GB
INFO 02-10 17:12:34 worker.py:267] Memory profiling takes 0.63 seconds
INFO 02-10 17:12:34 worker.py:267] the current vLLM instance can use total_gpu_memory (14.57GiB) x gpu_memory_utilization (0.90) = 13.11GiB
INFO 02-10 17:12:34 worker.py:267] model weights take 0.24GiB; non_torch_memory takes 0.03GiB; PyTorch activation peak memory takes 0.47GiB; the rest of the memory reserved for KV Cache is 12.37GiB.
INFO 02-10 17:12:34 executor_base.py:110] # CUDA blocks: 22526, # CPU blocks: 7281
INFO 02-10 17:12:34 executor_base.py:115] Maximum concurrency for 2048 tokens per request: 175.98x
INFO 02-10 17:12:38 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 4.57 seconds
Processed prompts: 100%|████████████████████████████████████████████████| 6/6 [00:05<00:00,  1.13it/s, est. speed input: 7.18 toks/s, output: 290.34 toks/s]
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx fork
Prompt: 'The president of the United States is', Generated text: " not a racist. He is a racist.\nHe's a racist because he's a racist.                                                                                                                                                                                                                                            "
Prompt: 'Hello, my name is', Generated text: ' J.C. and I am a student at the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California, Berkeley, and a graduate of the University of California, Berkeley. I am a graduate of the University of California'
Prompt: 'The future of AI is', Generated text: ' in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of'
Prompt: 'The capital of France is', Generated text: ' the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the'
Prompt: 'The future of AI is', Generated text: ' in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of the people.\n\nThe future of AI is in the hands of'
Prompt: 'Hello, I come from', Generated text: ' a family of 4 and I am a single mom. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom of 3. I am a single mom'
[rank0]:[W210 17:12:45.595748139 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

Signed-off-by: Mengqing Cao <cmq0113@163.com>

github-actions · 2025-02-10T09:30:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

tlrmchlsmth

Thanks for the fix!

(cc @youkaichao in case you see any gotchas)

youkaichao

I think the device is not really intended to be a way to specify the device index, but i have no objections to it.

Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>

Signed-off-by: Mengqing Cao <cmq0113@163.com>

[executor] init local_rank as device index

2bef2ad

Signed-off-by: Mengqing Cao <cmq0113@163.com>

MengqingCao mentioned this pull request Feb 10, 2025

[attn] fix device of tensors in attention vllm-project/vllm-ascend#25

Merged

jeejeelee mentioned this pull request Feb 11, 2025

[Bugfix] Explicitly set LoRA triton kernel device #13043

Closed

jeejeelee approved these changes Feb 11, 2025

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 11, 2025

tlrmchlsmth approved these changes Feb 11, 2025

View reviewed changes

youkaichao approved these changes Feb 11, 2025

View reviewed changes

youkaichao merged commit 9cf4759 into vllm-project:main Feb 11, 2025
49 checks passed

SzymonOzog pushed a commit to SzymonOzog/vllm that referenced this pull request Feb 12, 2025

[executor] init local_rank as device index (vllm-project#13027)

de172e2

Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>

kwang1012 pushed a commit to kwang1012/vllm that referenced this pull request Feb 12, 2025

[executor] init local_rank as device index (vllm-project#13027)

7328eb1

Signed-off-by: Mengqing Cao <cmq0113@163.com>

jeejeelee mentioned this pull request Feb 17, 2025

[Bug]: Benchmark v1 on multi-gpu crashes with ValueError: Pointer argument (at 0) cannot be accessed from Triton #13392

Closed

1 task

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025

[executor] init local_rank as device index (vllm-project#13027)

8528777

Signed-off-by: Mengqing Cao <cmq0113@163.com>

kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025

[executor] init local_rank as device index (vllm-project#13027)

31b26f6

Signed-off-by: Mengqing Cao <cmq0113@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[executor] init `local_rank` as device index #13027

[executor] init `local_rank` as device index #13027

MengqingCao commented Feb 10, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 10, 2025

tlrmchlsmth left a comment

youkaichao left a comment

[executor] init local_rank as device index #13027

[executor] init local_rank as device index #13027

Conversation

MengqingCao commented Feb 10, 2025 • edited by github-actions bot Loading

What does this pr do

Bug description

before this pr

after this pr

github-actions bot commented Feb 10, 2025

tlrmchlsmth left a comment

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment

[executor] init `local_rank` as device index #13027

[executor] init `local_rank` as device index #13027

MengqingCao commented Feb 10, 2025 •

edited by github-actions bot

Loading