-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Run:ai model streamer add GCS package support #24909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the runai-model-streamer dependency to version 0.14.0 and adds runai-model-streamer-gcs to enable loading models from Google Cloud Storage. The changes in the requirements files and documentation are consistent with this goal. I've found one high-severity issue related to packaging: the optional dependency group [runai] likely needs to be updated to include the new GCS package to make the feature available to users. Please see the detailed comment.
Package group was updated in #23845 |
requirements/test.in
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this version runai-model-streamer==0.14.0 cannot download model from s3, I'm not sure if it's my environment or what.
ERROR 09-16 02:53:54 [v1/engine/core.py:712] Exception: Could not receive runai_response from libstreamer due to: b'File access error'
(EngineCore_DP0 pid=375431) Process EngineCore_DP0:
(EngineCore_DP0 pid=375431) Traceback (most recent call last):
(EngineCore_DP0 pid=375431) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=375431) self.run()
(EngineCore_DP0 pid=375431) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=375431) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/engine/core.py", line 716, in run_engine_core
(EngineCore_DP0 pid=375431) raise e
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/engine/core.py", line 703, in run_engine_core
(EngineCore_DP0 pid=375431) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/engine/core.py", line 502, in __init__
(EngineCore_DP0 pid=375431) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/engine/core.py", line 81, in __init__
(EngineCore_DP0 pid=375431) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/executor/executor_base.py", line 55, in __init__
(EngineCore_DP0 pid=375431) self._init_executor()
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=375431) self.collective_rpc("load_model")
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=375431) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/utils/__init__.py", line 3067, in run_method
(EngineCore_DP0 pid=375431) return func(*args, **kwargs)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/worker/gpu_worker.py", line 214, in load_model
(EngineCore_DP0 pid=375431) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/v1/worker/gpu_model_runner.py", line 2390, in load_model
(EngineCore_DP0 pid=375431) self.model = model_loader.load_model(
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=375431) self.load_weights(model, model_config)
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/model_loader/runai_streamer_loader.py", line 103, in load_weights
(EngineCore_DP0 pid=375431) model.load_weights(
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/qwen3.py", line 344, in load_weights
(EngineCore_DP0 pid=375431) return loader.load_weights(weights)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 291, in load_weights
(EngineCore_DP0 pid=375431) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 249, in _load_module
(EngineCore_DP0 pid=375431) yield from self._load_module(prefix,
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
(EngineCore_DP0 pid=375431) loaded_params = module_load_weights(weights)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/qwen2.py", line 392, in load_weights
(EngineCore_DP0 pid=375431) for name, loaded_weight in weights:
(EngineCore_DP0 pid=375431) ^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 136, in <genexpr>
(EngineCore_DP0 pid=375431) for parts, weights_data in group),
(EngineCore_DP0 pid=375431) ^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 127, in <genexpr>
(EngineCore_DP0 pid=375431) for weight_name, weight_data in weights)
(EngineCore_DP0 pid=375431) ^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/models/utils.py", line 288, in <genexpr>
(EngineCore_DP0 pid=375431) weights = ((name, weight) for name, weight in weights
(EngineCore_DP0 pid=375431) ^^^^^^^
(EngineCore_DP0 pid=375431) File "/root/code/vllm/vllm/model_executor/model_loader/weight_utils.py", line 595, in runai_safetensors_weights_iterator
(EngineCore_DP0 pid=375431) yield from tensor_iter
(EngineCore_DP0 pid=375431) File "/usr/local/lib/python3.12/dist-packages/tqdm/std.py", line 1181, in __iter__
(EngineCore_DP0 pid=375431) for obj in iterable:
(EngineCore_DP0 pid=375431) ^^^^^^^^
(EngineCore_DP0 pid=375431) File "/usr/local/lib/python3.12/dist-packages/runai_model_streamer/safetensors_streamer/safetensors_streamer.py", line 84, in get_tensors
(EngineCore_DP0 pid=375431) for file_path, ready_chunk_index, buffer in self.file_streamer.get_chunks():
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/usr/local/lib/python3.12/dist-packages/runai_model_streamer/file_streamer/file_streamer.py", line 116, in get_chunks
(EngineCore_DP0 pid=375431) yield from self.request_ready_chunks()
(EngineCore_DP0 pid=375431) File "/usr/local/lib/python3.12/dist-packages/runai_model_streamer/file_streamer/file_streamer.py", line 137, in request_ready_chunks
(EngineCore_DP0 pid=375431) file_relative_index, chunk_relative_index = runai_response(self.streamer)
(EngineCore_DP0 pid=375431) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=375431) File "/usr/local/lib/python3.12/dist-packages/runai_model_streamer/libstreamer/libstreamer.py", line 93, in runai_response
(EngineCore_DP0 pid=375431) raise Exception(
(EngineCore_DP0 pid=375431) Exception: Could not receive runai_response from libstreamer due to: b'File access error'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more details on how you encountered this error (eg: command that you ran that resulted in this)? And did the same command work successfully with an older version of the model streamer? From the error, this looks like an authentication/permission problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use runai-model-streamer==0.13.0 this version is can running vllm, but upgrade version to 0.14.0 will have this error.
I can access minio, like this describe #23845 (comment), Not sure if it is related to this bug run-ai/runai-model-streamer#81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you upgrade runai-model-streamer-s3 to version 0.14.0 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't make it clear.
I use runai-model-streamer==0.13.0+ runai-model-streamer-s3==0.14.0 version can success running. but use runai-model-streamer==0.14.0+ runai-model-streamer-s3==0.14.0 will have a error "b'File access error'"
|
This pull request has merge conflicts that must be resolved before it can be |
bd80334 to
eb0ea15
Compare
tests/model_executor/model_loader/runai_model_streamer/test_runai_utils.py
Show resolved
Hide resolved
|
@22quinn could you please give this a review? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry missed this. LGTM.
Seems this test is currently not running in CI? Can we add it in: https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml
Separately, we should consolidate all model loader related tests.
eb0ea15 to
909cfe3
Compare
|
@DarkLight1337 I rebased this PR with your changes from #25765. Thanks for simplifying and wiring up the tests! |
d2d6e87 to
af4cedf
Compare
Signed-off-by: Peter Schuurman <psch@google.com>
…e Cloud Storage Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: Peter Schuurman <psch@google.com>
af4cedf to
b1cb285
Compare
Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: Peter Schuurman <psch@google.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Peter Schuurman <psch@google.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: Peter Schuurman <psch@google.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: Peter Schuurman <psch@google.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
runai-model-streamer[gcs]pip package by default, to enable GCS support in the nightly / published image.Test Plan / Test Result
Validated locally by building docker image and running locally (see updated documentation).