[Bug] TGI doesn't start due to permission denied #861

ksandowi · 2024-09-22T09:38:20Z

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Xeon-other (Please let us know in description)

Installation method

Pull docker images from hub.docker.com
Build docker images from source

Deploy method

Docker compose
Docker
Kubernetes
Helm

Running nodes

Single Node

What's the version?

tag v1.0

Description

The issue affects XEON on both SPR and EMR.
After modification of chatqna.yaml to run all services in TD (protected by TDX), all services running successfully except TGI service which fails during model downloading. It worked fine in previous (v0.9 and v0.8) versions

Reproduce steps

On a platform with TDX enabled, modify ~/GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml, so it run all services in TD:

 spec:

 runtimeClassName: kata-qemu-tdx
 securityContext:

Raw log

│ {"timestamp":"2024-09-22T09:31:24.526652Z","level":"INFO","fields":{"message":"Args {\n    model_id: \"Intel/neural-chat-7b-v3-3\",\n    revision: None,\n    validation_workers: 2,\n    sharded: None,\n  │
│    num_shard: None,\n    quantize: None,\n    speculate: None,\n    dtype: None,\n    trust_remote_code: false,\n    max_concurrent_requests: 128,\n    max_best_of: 2,\n    max_stop_sequences: 4,\n    ma │
│ x_top_n_tokens: 5,\n    max_input_tokens: None,\n    max_input_length: None,\n    max_total_tokens: None,\n    waiting_served_ratio: 0.3,\n    max_batch_prefill_tokens: None,\n    max_batch_total_tokens: │
│  None,\n    max_waiting_tokens: 20,\n    max_batch_size: None,\n    cuda_graphs: Some(\n        [\n            0,\n        ],\n    ),\n    hostname: \"chatqna-tgi-69d8bd845-xpnvj\",\n    port: 2080,\n    │
│  shard_uds_path: \"/tmp/text-generation-server\",\n    master_addr: \"localhost\",\n    master_port: 29500,\n    huggingface_hub_cache: Some(\n        \"/data\",\n    ),\n    weights_cache_override: None │
│ ,\n    disable_custom_kernels: false,\n    cuda_memory_fraction: 1.0,\n    rope_scaling: None,\n    rope_factor: None,\n    json_output: true,\n    otlp_endpoint: None,\n    otlp_service_name: \"text-gen │
│ eration-inference.router\",\n    cors_allow_origin: [],\n    api_key: None,\n    watermark_gamma: None,\n    watermark_delta: None,\n    ngrok: false,\n    ngrok_authtoken: None,\n    ngrok_edge: None,\n │
│     tokenizer_config_path: None,\n    disable_grammar_support: false,\n    env: false,\n    max_client_batch_size: 4,\n    lora_adapters: None,\n    usage_stats: On,\n}"},"target":"text_generation_launch │
│ er"}                                                                                                                                                                                                        │
│ {"timestamp":"2024-09-22T09:31:24.527102Z","level":"INFO","fields":{"message":"Token file not found \"/tmp/.cache/huggingface/token\"","log.target":"hf_hub","log.module_path":"hf_hub","log.file":"/usr/lo │
│ cal/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs","log.line":55},"target":"hf_hub"}                                                                                          │
│ {"timestamp":"2024-09-22T09:31:24.527452Z","level":"INFO","fields":{"message":"Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts  │
│ in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`."},"target":"text_generation_launc │
│ her"}                                                                                                                                                                                                       │
│ {"timestamp":"2024-09-22T09:31:24.527476Z","level":"INFO","fields":{"message":"Default `max_input_tokens` to 4095"},"target":"text_generation_launcher"}                                                    │
│ {"timestamp":"2024-09-22T09:31:24.527490Z","level":"INFO","fields":{"message":"Default `max_total_tokens` to 4096"},"target":"text_generation_launcher"}                                                    │
│ {"timestamp":"2024-09-22T09:31:24.527504Z","level":"INFO","fields":{"message":"Default `max_batch_prefill_tokens` to 4145"},"target":"text_generation_launcher"}                                            │
│ {"timestamp":"2024-09-22T09:31:24.528473Z","level":"INFO","fields":{"message":"Starting check and download process for Intel/neural-chat-7b-v3-3"},"target":"text_generation_launcher","span":{"name":"down │
│ load"},"spans":[{"name":"download"}]}                                                                                                                                                                       │
│ {"timestamp":"2024-09-22T09:31:28.345023Z","level":"ERROR","fields":{"message":"Download encountered an error: \nOpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k\nOp │
│ enBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k\nTraceback (most recent call last):\n  File \"/opt/conda/bin/text-generation-server\", line 5, in <module>\n    from t │
│ ext_generation_server.cli import app\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 10, in <module>\n    from text_generation_server.utils.adapter import parse_lo │
│ ra_adapters\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/__init__.py\", line 2, in <module>\n    from text_generation_server.utils.dist import initialize_torch_distribut │
│ ed\n  File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/dist.py\", line 6, in <module>\n    from text_generation_server.utils.import_utils import SYSTEM\n  File \"/opt/conda/lib │
│ /python3.10/site-packages/text_generation_server/utils/import_utils.py\", line 59, in <module>\n    import intel_extension_for_pytorch  # noqa: F401\n  File \"/opt/conda/lib/python3.10/site-packages/inte │
│ l_extension_for_pytorch/__init__.py\", line 122, in <module>\n    from . import xpu\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/__init__.py\", line 20, in <module>\n │
│     from .utils import *\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/utils.py\", line 6, in <module>\n    from .. import frontend\n  File \"/opt/conda/lib/python3.10 │
│ /site-packages/intel_extension_for_pytorch/frontend.py\", line 9, in <module>\n    from .nn import utils\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/__init__.py\", li │
│ ne 1, in <module>\n    from .modules import FrozenBatchNorm2d\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/modules/__init__.py\", line 8, in <module>\n    from ...cpu. │
│ nn.linear_fuse_eltwise import IPEXLinearEltwise\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/nn/linear_fuse_eltwise.py\", line 3, in <module>\n    from intel_extensio │
│ n_for_pytorch.nn.utils._weight_prepack import (\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/utils/__init__.py\", line 1, in <module>\n    from intel_extension_for_pyt │
│ orch.nn.utils import _weight_prepack\n  File \"/opt/conda/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/utils/_weight_prepack.py\", line 5, in <module>\n    import pkg_resources\n  File \"/ │
│ opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 3663, in <module>\n    def _initialize_master_working_set():\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__ini │
│ t__.py\", line 3646, in _call_aside\n    f(*args, **kwargs)\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 3675, in _initialize_master_working_set\n    working_set =  │
│ _declare_state('object', 'working_set', WorkingSet._build_master())\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 653, in _build_master\n    ws = cls()\n  File \"/op │
│ t/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 646, in __init__\n    self.add_entry(entry)\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 702, │
│  in add_entry\n    for dist in find_distributions(entry, True):\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 2333, in find_on_path\n    yield from factory(fullpath) │
│ \n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 2397, in distributions_from_metadata\n    yield Distribution.from_location(\n  File \"/opt/conda/lib/python3.10/site-p │
│ ackages/pkg_resources/__init__.py\", line 2947, in from_location\n    )._reload_version()\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 3367, in _reload_version\n    │
│  md_version = self._get_version()\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 3133, in _get_version\n    return _version_from_file(lines)\n  File \"/opt/conda/lib/ │
│ python3.10/site-packages/pkg_resources/__init__.py\", line 2892, in _version_from_file\n    line = next(iter(version_lines), '')\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.p │
│ y\", line 3129, in _get_metadata\n    yield from self.get_metadata_lines(name)\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 2209, in get_metadata_lines\n    return  │
│ yield_lines(self.get_metadata(name))\n  File \"/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py\", line 2196, in get_metadata\n    with open(self.path, encoding='utf-8', errors=\"replace │
│ \") as f:\nPermissionError: [Errno 13] Permission denied: '/opt/conda/lib/python3.10/site-packages/.wh.certifi-2022.12.7-py3.11.egg-info'"},"target":"text_generation_launcher","span":{"name":"download"}, │
│ "spans":[{"name":"download"}]}                                                                                                                                                                              │
│ Error: DownloadError                                                                                                                                                                                        │

The text was updated successfully, but these errors were encountered:

yongfengdu · 2024-09-23T03:28:58Z

Looks like tgi is trying to access more locations after enabled TDX.
Could you try comment out these lines for tgi (Line 1289-1298)?
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml#L1289
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault

ksandowi · 2024-09-23T07:28:24Z

That is good workaround. TGI service downloaded all required files to start successfully

yongfengdu · 2024-09-23T09:36:54Z

At 1.0 release, we enabled securityContext by default (#258). This will run the pod with non-root user, and with root file system readonly.

At the TGI pod start, it will download the required model to a emptyDir mounted volume /data.
Without TDX/Kata, this works fine, from the log, I can't identify whether it's the permission issue of /data, or other parts of the file system.

Easy fix would be disable the securityContext, but that need discussion whether that's a good way to go.

Before an official fix, you can use the above workaround while enabling TDX/kata.

ksandowi · 2024-09-23T09:46:59Z

Thanks a lot. Workaround works for me.
One more info.
TGI service starts successfully if I deploy it using kubernetes as runc and as kata-qemu (runtimeClass=kata-qemu). It only fails one it is run as kata-qemu-tdx (runtimeClass=kata-qemu-tdx).

eero-t · 2024-10-02T14:42:40Z

@ksandowi Could you track down which exact part of the SecurityContext conflicts with TDX, by removing security settings piece by piece and checking whether TGI still fails?

(I would assume it to be either runAs*, seccompProfile, or readOnlyRootFilesystem` settings.)

yinghu5 added Kube cloud and removed cloud labels Sep 23, 2024

yinghu5 assigned yongfengdu Sep 23, 2024

preethivenkatesh added the Dev label Sep 25, 2024

joshuayao added the bug Something isn't working label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] TGI doesn't start due to permission denied #861

[Bug] TGI doesn't start due to permission denied #861

ksandowi commented Sep 22, 2024 •

edited

Loading

yongfengdu commented Sep 23, 2024

ksandowi commented Sep 23, 2024

yongfengdu commented Sep 23, 2024

ksandowi commented Sep 23, 2024

eero-t commented Oct 2, 2024

[Bug] TGI doesn't start due to permission denied #861

[Bug] TGI doesn't start due to permission denied #861

Comments

ksandowi commented Sep 22, 2024 • edited Loading

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

yongfengdu commented Sep 23, 2024

ksandowi commented Sep 23, 2024

yongfengdu commented Sep 23, 2024

ksandowi commented Sep 23, 2024

eero-t commented Oct 2, 2024

ksandowi commented Sep 22, 2024 •

edited

Loading