You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
...
{"level": "None", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "[09-25 19:03:45.989 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "exc_info": "None", "stack_info": "None"}
{"level": "ERROR", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "", "exc_info": "Traceback (most recent call last):\n File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File "/usr/lib/python3.10/runpy.py", line 86, in _run_code\n exec(code, run_globals)\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 99, in \n main()\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 42, in main\n inference_env = prepare_environment()\n File "/opt/nim/llm/vllm_nvext/entrypoints/args.py", line 155, in prepare_environment\n engine_args, extracted_name = inject_ngc_hub(engine_args)\n File "/opt/nim/llm/vllm_nvext/hub/ngc_injector.py", line 247, in inject_ngc_hub\n cached = repo.get_all()\nException: error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "stack_info": "None"}
kubectl describe pod my-nim-01
...
Events:
Type Reason Age From Message
Warning BackOff 5m3s (x90 over 102m) kubelet Back-off restarting failed container nim-llm in pod my-nim-0_default(ce8f1e3a-f0e6-4a95-9086-2901091b7a57)
Normal Pulled 4m52s (x15 over 116m) kubelet Container image "nvcr.io/nim/meta/llama-3.1-70b-instruct:latest" already present on machine
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nim-0 0/1 Running 14 (6m46s ago) 117m
vim custom-value.yaml
image:
repository: "nvcr.io/nim/meta/llama-3.1-70b-instruct" # container location
tag: latest # NIM version you want to deploy
model:
ngcAPISecret: ngc-api # name of a secret in the cluster that includes a key named NGC_API_KEY and is an NGC API key
imagePullSecrets:
name: ngc-secret # name of a secret used to pull nvcr.io images, see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
persistence:
enabled: true
size: 800Gi
accessMode: ReadWriteMany
storageClass: ""
annotations:
helm.sh/resource-policy: "keep"
livenessProbe:
initialDelaySeconds: 600
periodSeconds: 60
timeoutSeconds: 10
startupProbe:
initialDelaySeconds: 600
periodSeconds: 60
timeoutSeconds: 10
failureThreshold: 1500
resources:
limits:
nvidia.com/gpu: 4 # much more GPU memory is required
The text was updated successfully, but these errors were encountered:
we follow the steps here: https://docs.nvidia.com/nim/large-language-models/latest/deploy-helm.html
after helm install ....
kubectl logs my-nim-01 --previous
...
{"level": "None", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "[09-25 19:03:45.989 ERROR nim_sdk::hub::repo rust/nim-sdk/src/hub/repo.rs:119] error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "exc_info": "None", "stack_info": "None"}
{"level": "ERROR", "time": "None", "file_name": "None", "file_path": "None", "line_number": "-1", "message": "", "exc_info": "Traceback (most recent call last):\n File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File "/usr/lib/python3.10/runpy.py", line 86, in _run_code\n exec(code, run_globals)\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 99, in \n main()\n File "/opt/nim/llm/vllm_nvext/entrypoints/launch.py", line 42, in main\n inference_env = prepare_environment()\n File "/opt/nim/llm/vllm_nvext/entrypoints/args.py", line 155, in prepare_environment\n engine_args, extracted_name = inject_ngc_hub(engine_args)\n File "/opt/nim/llm/vllm_nvext/hub/ngc_injector.py", line 247, in inject_ngc_hub\n cached = repo.get_all()\nException: error sending request for url (https://api.ngc.nvidia.com/v2/org/nim/team/meta/models/llama-3_1-70b-instruct/hf-1d54af3-nim1.2/files)", "stack_info": "None"}
kubectl describe pod my-nim-01
...
Events:
Type Reason Age From Message
Warning BackOff 5m3s (x90 over 102m) kubelet Back-off restarting failed container nim-llm in pod my-nim-0_default(ce8f1e3a-f0e6-4a95-9086-2901091b7a57)
Normal Pulled 4m52s (x15 over 116m) kubelet Container image "nvcr.io/nim/meta/llama-3.1-70b-instruct:latest" already present on machine
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nim-0 0/1 Running 14 (6m46s ago) 117m
vim custom-value.yaml
image:
repository: "nvcr.io/nim/meta/llama-3.1-70b-instruct" # container location
tag: latest # NIM version you want to deploy
model:
ngcAPISecret: ngc-api # name of a secret in the cluster that includes a key named NGC_API_KEY and is an NGC API key
imagePullSecrets:
persistence:
enabled: true
size: 800Gi
accessMode: ReadWriteMany
storageClass: ""
annotations:
helm.sh/resource-policy: "keep"
livenessProbe:
initialDelaySeconds: 600
periodSeconds: 60
timeoutSeconds: 10
startupProbe:
initialDelaySeconds: 600
periodSeconds: 60
timeoutSeconds: 10
failureThreshold: 1500
resources:
limits:
nvidia.com/gpu: 4 # much more GPU memory is required
The text was updated successfully, but these errors were encountered: