Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error to deploy llama-3.1-8b-instruct:1.1.1 using downloaded model repository with modelcar and kserve #64

Open
xieshenzh opened this issue Aug 7, 2024 · 2 comments

Comments

@xieshenzh
Copy link

I tried to deploy llama-3.1-8b-instruct:1.1.1 with Kserve and modelcar on Openshift AI.

What I have done?

  1. Downloaded the models files: podman run --rm -e NGC_API_KEY=<API_KEY> -v /models:/opt/nim/.cache nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.1 create-model-store --profile <PROFILE> --model-store /opt/nim/.cache.
  2. Built a modelcar image by copying the models files, using this Dockerfile:
FROM --platform=linux/amd64 busybox
RUN mkdir /models && chmod 775 /models
COPY /models/ /models/
  1. Setup the environment based on the guide.
  2. Deployed the ServingRuntime CR and set the NIM_MODEL_NAME environment variable to /mnt/models/ which is the path where model files mounted from the modelcar container.
---
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: nvidia-nim-llama-3.1-8b-instruct-1.1.1
spec:
  annotations:
    prometheus.kserve.io/path: /metrics
    prometheus.kserve.io/port: '8000'
    serving.kserve.io/enable-metric-aggregation: 'true'
    serving.kserve.io/enable-prometheus-scraping: 'true'
  containers:
    - env:
        - name: NIM_MODEL_NAME
          value: /mnt/models/
        - name: NIM_SERVED_MODEL_NAME
          value: meta/llama3-8b-instruct
        - name: NGC_API_KEY
          valueFrom:
            secretKeyRef:
              key: NGC_API_KEY
              name: nvidia-nim-secrets
      image: 'nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.1'
      name: kserve-container
      ports:
        - containerPort: 8000
          protocol: TCP
      resources:
        limits:
          cpu: '12'
          memory: 32Gi
        requests:
          cpu: '12'
          memory: 32Gi
      volumeMounts:
        - mountPath: /dev/shm
          name: dshm
  imagePullSecrets:
    - name: ngc-secret
  protocolVersions:
    - v2
    - grpc-v2
  supportedModelFormats:
    - autoSelect: true
      name: nvidia-nim-llama-3.1-8b-instruct
      priority: 1
      version: 1.1.1
  volumes:
    - emptyDir:
        medium: Memory
        sizeLimit: 25Gi
      name: dshm
  1. Deployed the InferenceService CR and set the storageUri to use the modelcar image created in 2.
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    autoscaling.knative.dev/target: '10'
  name: llama-3-1-8b-instruct-1xgpu
spec:
  predictor:
    minReplicas: 1
    model:
      modelFormat:
        name: nvidia-nim-llama-3.1-8b-instruct
      name: ''
      resources:
        limits:
          nvidia.com/gpu: '1'
        requests:
          nvidia.com/gpu: '1'
      runtime: nvidia-nim-llama-3.1-8b-instruct-1.1.1
      storageUri: 'oci://<modelcar image registry and name>:<tag>'
  1. The Pod failed to start due to an error:
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/nim/llm/vllm_nvext/entrypoints/openai/api_server.py", line 702, in <module>
    engine = AsyncLLMEngineFactory.from_engine_args(engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
  File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine_factory.py", line 33, in from_engine_args
    engine = engine_cls.from_engine_args(engine_args, start_engine_loop, usage_context)
  File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 304, in from_engine_args
    return cls(
  File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 278, in __init__
    self.engine: _AsyncTRTLLMEngine = self._init_engine(*args, **kwargs)
  File "/opt/nim/llm/.venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 505, in _init_engine
    return engine_class(*args, **kwargs)
  File "/opt/nim/llm/vllm_nvext/engine/async_trtllm_engine.py", line 136, in __init__
    self._tllm_engine = TrtllmModelRunner(
  File "/opt/nim/llm/vllm_nvext/engine/trtllm_model_runner.py", line 275, in __init__
    self._tllm_exec, self._cfg = self._create_engine(
  File "/opt/nim/llm/vllm_nvext/engine/trtllm_model_runner.py", line 569, in _create_engine
    return create_trt_executor(
  File "/opt/nim/llm/vllm_nvext/trtllm/utils.py", line 283, in create_trt_executor
    engine_size_bytes = _get_rank_engine_file_size_bytes(profile_dir)
  File "/opt/nim/llm/vllm_nvext/trtllm/utils.py", line 226, in _get_rank_engine_file_size_bytes
    engine_size_bytes = rank0_engine.stat().st_size
  File "/usr/lib/python3.10/pathlib.py", line 1097, in stat
    return self._accessor.stat(self, follow_symlinks=follow_symlinks)
FileNotFoundError: [Errno 2] No such file or directory: '/models/trtllm_engine/rank0.engine'

Issue:
The directory containing model files in the sidecar container is correctly mounted to the NIM container with a symlink:

(Scripts executed in the terminal of the NIM container)

$ ls -al /mnt/models
lrwxrwxrwx. 1 1001090000 1001090000 20 Aug  7 20:34 /mnt/models -> /proc/76/root/models
$ ls -al /proc/76/root/models/trtllm_engine/rank0.engine 
-rw-r--r--. 1 root root 16218123260 Jul 30 18:18 /proc/76/root/models/trtllm_engine/rank0.engine

Code of the NIM container invokes function_get_rank_engine_file_size_bytes in vllm_nvext/trtllm/utils.py which calls Path.resolve() to resolve the symlink.
As a result, the directory containing the rank engine file (i.e. /proc/76/root/models/trtllm_engine/rank0.engine) is resolved to /models/trtllm_engine/rank0.engine which is invalid.
Then, the code could not find the file /models/trtllm_engine/rank0.engine to get its file size, and threw the error.

What I expect?
NIM container should properly resolve the symlink to the directory containing the model files.

@mpaulgreen
Copy link

@supertetelman can you take a look into the issue.

@mosfeets
Copy link

@xieshenzh thanks for reporting this, I'm trying to do the exact same thing. Followed your procedure and got the same results with the nvidia-nim-llama-3.1-8b-instruct-1.1.2 image

My overall thought is to pre-cache new NIM models with modelcars on each of my OpenShift nodes using image puller and let KServe do its thing for faster scale up when necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants