You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deployed the ServingRuntime CR and set the NIM_MODEL_NAME environment variable to /mnt/models/ which is the path where model files mounted from the modelcar container.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/nim/llm/vllm_nvext/entrypoints/openai/api_server.py", line 702, in <module>
engine = AsyncLLMEngineFactory.from_engine_args(engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine_factory.py", line 33, in from_engine_args
engine = engine_cls.from_engine_args(engine_args, start_engine_loop, usage_context)
File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 304, in from_engine_args
return cls(
File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 278, in __init__
self.engine: _AsyncTRTLLMEngine = self._init_engine(*args, **kwargs)
File "/opt/nim/llm/.venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 505, in _init_engine
return engine_class(*args, **kwargs)
File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 136, in __init__
self._tllm_engine = TrtllmModelRunner(
File "/opt/nim/llm/vllm_nvext/engine/trtllm_model_runner.py", line 275, in __init__
self._tllm_exec, self._cfg = self._create_engine(
File "/opt/nim/llm/vllm_nvext/engine/trtllm_model_runner.py", line 569, in _create_engine
return create_trt_executor(
File "/opt/nim/llm/vllm_nvext/trtllm/utils.py", line 283, in create_trt_executor
engine_size_bytes = _get_rank_engine_file_size_bytes(profile_dir)
File "/opt/nim/llm/vllm_nvext/trtllm/utils.py", line 226, in _get_rank_engine_file_size_bytes
engine_size_bytes = rank0_engine.stat().st_size
File "/usr/lib/python3.10/pathlib.py", line 1097, in stat
return self._accessor.stat(self, follow_symlinks=follow_symlinks)
FileNotFoundError: [Errno 2] No such file or directory: '/models/trtllm_engine/rank0.engine'
Issue:
The directory containing model files in the sidecar container is correctly mounted to the NIM container with a symlink:
(Scripts executed in the terminal of the NIM container)
$ ls -al /mnt/models
lrwxrwxrwx. 1 1001090000 1001090000 20 Aug 7 20:34 /mnt/models -> /proc/76/root/models
$ ls -al /proc/76/root/models/trtllm_engine/rank0.engine
-rw-r--r--. 1 root root 16218123260 Jul 30 18:18 /proc/76/root/models/trtllm_engine/rank0.engine
Code of the NIM container invokes function_get_rank_engine_file_size_bytes in vllm_nvext/trtllm/utils.py which calls Path.resolve() to resolve the symlink.
As a result, the directory containing the rank engine file (i.e. /proc/76/root/models/trtllm_engine/rank0.engine) is resolved to /models/trtllm_engine/rank0.engine which is invalid.
Then, the code could not find the file /models/trtllm_engine/rank0.engine to get its file size, and threw the error.
What I expect?
NIM container should properly resolve the symlink to the directory containing the model files.
The text was updated successfully, but these errors were encountered:
@xieshenzh thanks for reporting this, I'm trying to do the exact same thing. Followed your procedure and got the same results with the nvidia-nim-llama-3.1-8b-instruct-1.1.2 image
My overall thought is to pre-cache new NIM models with modelcars on each of my OpenShift nodes using image puller and let KServe do its thing for faster scale up when necessary.
I tried to deploy
llama-3.1-8b-instruct:1.1.1
with Kserve and modelcar on Openshift AI.What I have done?
podman run --rm -e NGC_API_KEY=<API_KEY> -v /models:/opt/nim/.cache nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.1 create-model-store --profile <PROFILE> --model-store /opt/nim/.cache
.ServingRuntime
CR and set theNIM_MODEL_NAME
environment variable to /mnt/models/ which is the path where model files mounted from the modelcar container.InferenceService
CR and set thestorageUri
to use the modelcar image created in 2.Issue:
The directory containing model files in the sidecar container is correctly mounted to the NIM container with a symlink:
Code of the NIM container invokes function
_get_rank_engine_file_size_bytes
invllm_nvext/trtllm/utils.py
which calls Path.resolve() to resolve the symlink.As a result, the directory containing the rank engine file (i.e.
/proc/76/root/models/trtllm_engine/rank0.engine
) is resolved to/models/trtllm_engine/rank0.engine
which is invalid.Then, the code could not find the file
/models/trtllm_engine/rank0.engine
to get its file size, and threw the error.What I expect?
NIM container should properly resolve the symlink to the directory containing the model files.
The text was updated successfully, but these errors were encountered: