Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Better developer experience for bringing up TGI-Service #706

Closed
2 of 6 tasks
arun-gupta opened this issue Aug 30, 2024 · 8 comments
Closed
2 of 6 tasks

[Bug] Better developer experience for bringing up TGI-Service #706

arun-gupta opened this issue Aug 30, 2024 · 8 comments
Assignees
Labels

Comments

@arun-gupta
Copy link
Contributor

Priority

Undecided

OS type

Ubuntu

Hardware type

Xeon-SPR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source

Deploy method

  • Docker compose
  • Docker
  • Kubernetes
  • Helm

Running nodes

Single Node

What's the version?

0.9

Description

The instructions at https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon needs a better user experience.

Testing the LLM service says:

In first startup, this service will take more time to download the LLM file. After it's finished, the service will be ready.

Use docker logs CONTAINER_ID to check if the download is finished.

The container has been running for four hours now and still connecting to the service gives the following error:

ubuntu@ip-172-31-37-13:~$ curl http://${host_ip}:9009/generate   -X POST   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'   -H 'Content-Type: application/json'
curl: (7) Failed to connect to 172.31.37.13 port 9009 after 0 ms: Couldn't connect to server

There should be a clear indication of how the developer would know the download is finished. Also, the container name is tgi-service so that should be specified.

Reproduce steps

The steps are documented at https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7

Raw log

ubuntu@ip-172-31-37-13:~$ sudo docker compose logs -f
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
redis-vector-db              | 9:C 30 Aug 2024 18:59:12.941 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-vector-db              | 9:C 30 Aug 2024 18:59:12.941 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
chatqna-xeon-backend-server  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
chatqna-xeon-backend-server  | 
chatqna-xeon-backend-server  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
chatqna-xeon-backend-server  |   warnings.warn(
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,995] [    INFO] - Base service - CORS is enabled.
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,996] [    INFO] - Base service - Setting up HTTP server
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,997] [    INFO] - Base service - Uvicorn server setup on port 8888
chatqna-xeon-backend-server  | INFO:     Waiting for application startup.
chatqna-xeon-backend-server  | INFO:     Application startup complete.
redis-vector-db              | 9:C 30 Aug 2024 18:59:12.941 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=9, just started
chatqna-xeon-ui-server       | 
redis-vector-db              | 9:C 30 Aug 2024 18:59:12.941 * Configuration loaded
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.941 * monotonic clock: POSIX clock_gettime
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.942 * Running mode=standalone, port=6379.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.942 * Module 'RedisCompat' loaded from /opt/redis-stack/lib/rediscompat.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Redis version found by RedisSearch : 7.2.4 - oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> RediSearch version 2.8.12 (Git=2.8-32fdaca)
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Low level api version 1 initialized successfully
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results:  10000, search pool size: 20, index pool size: 8, 
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Initialized thread pools!
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Enabled role change notification
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * Module 'search' loaded from /opt/redis-stack/lib/redisearch.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> RedisTimeSeries version 11011, git_sha=0299ac12a6bf298028859c41ba0f4d8dc842726b
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Redis version found by RedisTimeSeries : 7.2.4 - oss
chatqna-xeon-backend-server  | INFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
chatqna-xeon-ui-server       | > sveltekit-auth-example@0.0.1 preview
chatqna-xeon-backend-server  | [2024-08-30 18:59:15,010] [    INFO] - Base service - HTTP server setup successful
chatqna-xeon-ui-server       | > vite preview --port 5173 --host 0.0.0.0
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded default CHUNK_SIZE_BYTES policy: 4096
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded server DUPLICATE_POLICY: block
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Setting default series ENCODING to: compressed
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Detected redis oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * Module 'timeseries' loaded from /opt/redis-stack/lib/redistimeseries.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Created new data type 'ReJSON-RL'
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> version: 20609 git sha: unknown branch: unknown
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V1 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V2 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V3 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V4 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V5 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Enabled diskless replication
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * Module 'ReJSON' loaded from /opt/redis-stack/lib/rejson.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <search> Acquired RedisJSON_V5 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <bf> RedisBloom version 2.6.12 (Git=unknown)
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * Module 'bf' loaded from /opt/redis-stack/lib/redisbloom.so
chatqna-xeon-ui-server       | 
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Created new data type 'GearsType'
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Detected redis oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 # <redisgears_2> could not initialize RedisAI_InitError
redis-vector-db              | 
chatqna-xeon-ui-server       | 
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Failed loading RedisAI API.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> RedisGears v2.0.19, sha='671030bbcb7de4582d00575a0902f826da3efe73', build_type='release', built_for='Linux-ubuntu22.04.x86_64'.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Registered backend: js.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
chatqna-xeon-ui-server       |   ➜  Local:   http://localhost:5173/
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.948 * Server initialized
chatqna-xeon-ui-server       |   ➜  Network: http://172.18.0.12:5173/
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.948 * Ready to accept connections tcp
reranking-tei-xeon-server    | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
reranking-tei-xeon-server    | 
reranking-tei-xeon-server    | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server    |   warnings.warn(
reranking-tei-xeon-server    | [2024-08-30 18:59:16,271] [    INFO] - Base service - CORS is enabled.
reranking-tei-xeon-server    | [2024-08-30 18:59:16,271] [    INFO] - Base service - Setting up HTTP server
retriever-redis-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
retriever-redis-server       |   warnings.warn(
llm-tgi-server               | Defaulting to user installation because normal site-packages is not writeable
llm-tgi-server               | Collecting langserve (from -r requirements-runtime.txt (line 1))
llm-tgi-server               |   Downloading langserve-0.2.2-py3-none-any.whl.metadata (39 kB)
reranking-tei-xeon-server    | [2024-08-30 18:59:16,272] [    INFO] - Base service - Uvicorn server setup on port 8000
reranking-tei-xeon-server    | INFO:     Waiting for application startup.
reranking-tei-xeon-server    | INFO:     Application startup complete.
retriever-redis-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
retriever-redis-server       | 
reranking-tei-xeon-server    | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server    | [2024-08-30 18:59:16,280] [    INFO] - Base service - HTTP server setup successful
llm-tgi-server               | Requirement already satisfied: httpx>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.27.0)
retriever-redis-server       | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
retriever-redis-server       |   warnings.warn(
retriever-redis-server       | [2024-08-30 18:59:16,210] [    INFO] - Base service - CORS is enabled.
retriever-redis-server       | [2024-08-30 18:59:16,211] [    INFO] - Base service - Setting up HTTP server
retriever-redis-server       | [2024-08-30 18:59:16,212] [    INFO] - Base service - Uvicorn server setup on port 7000
retriever-redis-server       | INFO:     Waiting for application startup.
llm-tgi-server               | Requirement already satisfied: langchain-core<0.3,>=0.1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.1.7)
retriever-redis-server       | INFO:     Application startup complete.
retriever-redis-server       | INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
tei-reranking-server         | 2024-08-30T18:59:12.958211Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "7fd37f4b6037", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
retriever-redis-server       | [2024-08-30 18:59:16,214] [    INFO] - Base service - HTTP server setup successful
embedding-tei-server         | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
embedding-tei-server         |   warnings.warn(
embedding-tei-server         | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
embedding-tei-server         | 
embedding-tei-server         | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
embedding-tei-server         |   warnings.warn(
embedding-tei-server         | [2024-08-30 18:59:16,190] [    INFO] - Base service - CORS is enabled.
embedding-tei-server         | [2024-08-30 18:59:16,191] [    INFO] - Base service - Setting up HTTP server
embedding-tei-server         | [2024-08-30 18:59:16,191] [    INFO] - Base service - Uvicorn server setup on port 6000
embedding-tei-server         | INFO:     Waiting for application startup.
embedding-tei-server         | INFO:     Application startup complete.
embedding-tei-server         | INFO:     Uvicorn running on http://0.0.0.0:6000 (Press CTRL+C to quit)
tei-reranking-server         | 2024-08-30T18:59:12.958284Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
tei-reranking-server         | 2024-08-30T18:59:13.004747Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-reranking-server         | 2024-08-30T18:59:13.155043Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-reranking-server         | 2024-08-30T18:59:13.171879Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-reranking-server         | 2024-08-30T18:59:13.171889Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-reranking-server         | 2024-08-30T18:59:13.225855Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
dataprep-redis-server        | 
tgi-service                  | 2024-08-30T18:59:12.959781Z  INFO text_generation_launcher: Args {
retriever-redis-server       | /home/user/.local/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceHubEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEndpointEmbeddings`.
retriever-redis-server       |   warn_deprecated(
retriever-redis-server       | INFO:     172.31.37.13:43390 - "POST /v1/retrieval HTTP/1.1" 200 OK
tgi-service                  |     model_id: "Intel/neural-chat-7b-v3-3",
tgi-service                  |     revision: None,
tgi-service                  |     validation_workers: 2,
tgi-service                  |     sharded: None,
tgi-service                  |     num_shard: None,
tgi-service                  |     quantize: None,
tgi-service                  |     speculate: None,
tgi-service                  |     dtype: None,
tgi-service                  |     trust_remote_code: false,
tgi-service                  |     max_concurrent_requests: 128,
tgi-service                  |     max_best_of: 2,
tgi-service                  |     max_stop_sequences: 4,
tgi-service                  |     max_top_n_tokens: 5,
tgi-service                  |     max_input_tokens: None,
tgi-service                  |     max_input_length: None,
tgi-service                  |     max_total_tokens: None,
tgi-service                  |     waiting_served_ratio: 0.3,
tgi-service                  |     max_batch_prefill_tokens: None,
tgi-service                  |     max_batch_total_tokens: None,
tgi-service                  |     max_waiting_tokens: 20,
tgi-service                  |     max_batch_size: None,
tgi-service                  |     cuda_graphs: Some(
tgi-service                  |         [
tgi-service                  |             0,
tgi-service                  |         ],
tgi-service                  |     ),
tgi-service                  |     hostname: "a0a208e32895",
tgi-service                  |     port: 80,
tgi-service                  |     shard_uds_path: "/tmp/text-generation-server",
tgi-service                  |     master_addr: "localhost",
tgi-service                  |     master_port: 29500,
tgi-service                  |     huggingface_hub_cache: Some(
tgi-service                  |         "/data",
tgi-service                  |     ),
tgi-service                  |     weights_cache_override: None,
tgi-service                  |     disable_custom_kernels: false,
tei-reranking-server         | 2024-08-30T18:59:13.582628Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-reranking-server         | 2024-08-30T18:59:13.602241Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-base/resolve/main/model.onnx)
tei-reranking-server         | 2024-08-30T18:59:13.602265Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-reranking-server         | 2024-08-30T18:59:14.691879Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.519996915s
tei-reranking-server         | 2024-08-30T18:59:15.270436Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
tei-reranking-server         | 2024-08-30T18:59:15.270455Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-reranking-server         | 2024-08-30T18:59:15.270766Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-reranking-server         | 2024-08-30T18:59:17.123235Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-reranking-server         | 2024-08-30T18:59:18.915992Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-reranking-server         | 2024-08-30T18:59:18.916011Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
embedding-tei-server         | [2024-08-30 18:59:16,194] [    INFO] - Base service - HTTP server setup successful
llm-tgi-server               | Requirement already satisfied: orjson>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (3.10.7)
embedding-tei-server         | [2024-08-30 18:59:16,248] [    INFO] - embedding_tei_langchain - TEI Gaudi Embedding initialized.
tgi-service                  |     cuda_memory_fraction: 1.0,
llm-tgi-server               | Requirement already satisfied: pydantic>=1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (2.5.3)
llm-tgi-server               | Collecting pyproject-toml<0.0.11,>=0.0.10 (from langserve->-r requirements-runtime.txt (line 1))
tgi-service                  |     rope_scaling: None,
tgi-service                  |     rope_factor: None,
tgi-service                  |     json_output: false,
tgi-service                  |     otlp_endpoint: None,
tgi-service                  |     otlp_service_name: "text-generation-inference.router",
tgi-service                  |     cors_allow_origin: [],
tgi-service                  |     api_key: None,
tgi-service                  |     watermark_gamma: None,
tgi-service                  |     watermark_delta: None,
tgi-service                  |     ngrok: false,
tgi-service                  |     ngrok_authtoken: None,
tgi-service                  |     ngrok_edge: None,
tgi-service                  |     tokenizer_config_path: None,
tgi-service                  |     disable_grammar_support: false,
tgi-service                  |     env: false,
llm-tgi-server               |   Downloading pyproject_toml-0.0.10-py3-none-any.whl.metadata (642 bytes)
llm-tgi-server               | Requirement already satisfied: anyio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (4.2.0)
llm-tgi-server               | Requirement already satisfied: certifi in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (2023.11.17)
llm-tgi-server               | Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.0.5)
tgi-service                  |     max_client_batch_size: 4,
dataprep-redis-server        | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
tgi-service                  |     lora_adapters: None,
dataprep-redis-server        |   warnings.warn(
embedding-tei-server         | INFO:     172.31.37.13:48430 - "POST /v1/embeddings HTTP/1.1" 200 OK
llm-tgi-server               | Requirement already satisfied: idna in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (3.6)
llm-tgi-server               | Requirement already satisfied: sniffio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.3.0)
llm-tgi-server               | Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (0.14.0)
llm-tgi-server               | Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (6.0.1)
tgi-service                  |     usage_stats: On,
llm-tgi-server               | Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (1.33)
tgi-service                  | }
tgi-service                  | 2024-08-30T18:59:12.959986Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"    
tei-reranking-server         | 2024-08-30T18:59:18.916130Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
dataprep-redis-server        |   warnings.warn(
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-redis-server        |   warnings.warn(
tei-reranking-server         | 2024-08-30T18:59:18.917526Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server         | 2024-08-30T18:59:12.924208Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "2f27cdb160ff", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
tei-embedding-server         | 2024-08-30T18:59:12.924290Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
tei-embedding-server         | 2024-08-30T18:59:12.973834Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-embedding-server         | 2024-08-30T18:59:13.069091Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-embedding-server         | 2024-08-30T18:59:13.138839Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server         | 2024-08-30T18:59:13.138853Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
llm-tgi-server               | Requirement already satisfied: langsmith<0.1.0,>=0.0.63 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (0.0.77)
llm-tgi-server               | Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (23.2)
llm-tgi-server               | Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.31.0)
llm-tgi-server               | Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (8.2.3)
llm-tgi-server               | Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (0.6.0)
llm-tgi-server               | Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (2.14.6)
tei-embedding-server         | 2024-08-30T18:59:13.166225Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-embedding-server         | 2024-08-30T18:59:13.242409Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-embedding-server         | 2024-08-30T18:59:13.261180Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
tei-embedding-server         | 2024-08-30T18:59:13.261201Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-embedding-server         | 2024-08-30T18:59:14.089828Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 950.976918ms
tei-embedding-server         | 2024-08-30T18:59:14.102507Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-embedding-server         | 2024-08-30T18:59:14.102781Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-embedding-server         | 2024-08-30T18:59:14.144519Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-embedding-server         | 2024-08-30T18:59:15.160364Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
llm-tgi-server               | Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (4.9.0)
tei-reranking-server         | 2024-08-30T18:59:18.917532Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tgi-service                  | 2024-08-30T18:59:13.012501Z  INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
tgi-service                  | 2024-08-30T18:59:13.012520Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
tgi-service                  | 2024-08-30T18:59:13.012522Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-service                  | 2024-08-30T18:59:13.012523Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-service                  | 2024-08-30T18:59:13.012616Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
tgi-service                  | 2024-08-30T18:59:17.123445Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Downloading PyTorch weights.
tgi-service                  | 2024-08-30T18:59:17.156988Z  INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
tgi-service                  | 2024-08-30T18:59:46.974574Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00001-of-00002.bin in 0:00:29.
tgi-service                  | 2024-08-30T18:59:46.974598Z  INFO text_generation_launcher: Download: [1/2] -- ETA: 0:00:29
tgi-service                  | 2024-08-30T18:59:46.974779Z  INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
tgi-service                  | 2024-08-30T19:00:17.058207Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:30.
tgi-service                  | 2024-08-30T19:00:17.058225Z  INFO text_generation_launcher: Download: [2/2] -- ETA: 0
tgi-service                  | 2024-08-30T19:00:17.058238Z  WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi-service                  | 2024-08-30T19:00:17.058243Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service                  | Error: DownloadError
tgi-service                  | 2024-08-30T19:01:00.144114Z ERROR download: text_generation_launcher: Download encountered an error: 
tgi-service                  | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
tgi-service                  | 2024-08-30 18:59:16.457 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
tgi-service                  | /opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
tgi-service                  |   warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
llm-tgi-server               | Requirement already satisfied: setuptools>=42 in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (65.5.1)
llm-tgi-server               | Requirement already satisfied: wheel in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.42.0)
llm-tgi-server               | Collecting toml (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1))
llm-tgi-server               |   Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
tgi-service                  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
tgi-service                  | │ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:324 in
tgi-service                  | │ download_weights                                                             │
tgi-service                  | │                                                                              │
tgi-service                  | │   321 │   │   except Exception:                                              │
tgi-service                  | │   322 │   │   │   discard_names = []                                         │
tgi-service                  | │   323 │   │   # Convert pytorch weights to safetensors                       │
tgi-service                  | │ ❱ 324 │   │   utils.convert_files(local_pt_files, local_st_files, discard_na │
tgi-service                  | │   325                                                                        │
tgi-service                  | │   326                                                                        │
llm-tgi-server               | Requirement already satisfied: jsonschema in /home/user/.local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (4.23.0)
llm-tgi-server               | Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.4)
llm-tgi-server               | Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (3.3.2)
llm-tgi-server               | Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.1.0)
llm-tgi-server               | Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (23.2.0)
llm-tgi-server               | Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (2023.12.1)
tei-embedding-server         | 2024-08-30T18:59:15.160393Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
tei-embedding-server         | 2024-08-30T18:59:15.160518Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-embedding-server         | 2024-08-30T18:59:15.162024Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server         | 2024-08-30T18:59:15.162038Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-embedding-server         | 2024-08-30T19:03:19.614297Z  INFO embed{total_time="10.586548ms" tokenization_time="185.813µs" queue_time="499.042µs" inference_time="9.826847ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
llm-tgi-server               | Requirement already satisfied: referencing>=0.28.4 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.35.1)
llm-tgi-server               | Requirement already satisfied: rpds-py>=0.7.1 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.20.0)
llm-tgi-server               | Downloading langserve-0.2.2-py3-none-any.whl (1.2 MB)
llm-tgi-server               |    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 114.0 MB/s eta 0:00:00
llm-tgi-server               | Downloading pyproject_toml-0.0.10-py3-none-any.whl (6.9 kB)
llm-tgi-server               | Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
llm-tgi-server               | Installing collected packages: toml, pyproject-toml, langserve
llm-tgi-server               | Successfully installed langserve-0.2.2 pyproject-toml-0.0.10 toml-0.10.2
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
llm-tgi-server               | 
llm-tgi-server               | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | [2024-08-30 18:59:16,096] [    INFO] - Base service - CORS is enabled.
llm-tgi-server               | [2024-08-30 18:59:16,097] [    INFO] - Base service - Setting up HTTP server
llm-tgi-server               | [2024-08-30 18:59:16,097] [    INFO] - Base service - Uvicorn server setup on port 9000
llm-tgi-server               | INFO:     Waiting for application startup.
llm-tgi-server               | INFO:     Application startup complete.
llm-tgi-server               | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
llm-tgi-server               | [2024-08-30 18:59:16,100] [    INFO] - Base service - HTTP server setup successful
dataprep-redis-server        | [2024-08-30 18:59:17,957] [    INFO] - Base service - CORS is enabled.
dataprep-redis-server        | [2024-08-30 18:59:17,958] [    INFO] - Base service - Setting up HTTP server
dataprep-redis-server        | [2024-08-30 18:59:17,959] [    INFO] - Base service - Uvicorn server setup on port 6007
dataprep-redis-server        | INFO:     Waiting for application startup.
dataprep-redis-server        | INFO:     Application startup complete.
dataprep-redis-server        | INFO:     Uvicorn running on http://0.0.0.0:6007 (Press CTRL+C to quit)
dataprep-redis-server        | [2024-08-30 18:59:17,961] [    INFO] - Base service - HTTP server setup successful
error from daemon in stream: Error grabbing logs: unexpected EOF
@arun-gupta
Copy link
Contributor Author

The service is still giving the same error after five hours:

ubuntu@ip-172-31-37-13:~$ curl http://${host_ip}:9009/generate   -X POST   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'   -H 'Content-Type: application/json'
curl: (7) Failed to connect to 172.31.37.13 port 9009 after 0 ms: Couldn't connect to server

Here are logs:

ubuntu@ip-172-31-37-13:~$ sudo docker compose logs 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
reranking-tei-xeon-server  | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tei-embedding-server       | 2024-08-30T18:59:12.924208Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "2f27cdb160ff", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
tei-reranking-server       | 2024-08-30T18:59:12.958211Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "7fd37f4b6037", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
reranking-tei-xeon-server  | 
reranking-tei-xeon-server  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server  |   warnings.warn(
tei-reranking-server       | 2024-08-30T18:59:12.958284Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
tei-reranking-server       | 2024-08-30T18:59:13.004747Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-embedding-server       | 2024-08-30T18:59:12.924290Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
reranking-tei-xeon-server  | [2024-08-30 18:59:16,271] [    INFO] - Base service - CORS is enabled.
tei-embedding-server       | 2024-08-30T18:59:12.973834Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-reranking-server       | 2024-08-30T18:59:13.155043Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-reranking-server       | 2024-08-30T18:59:13.171879Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server       | 2024-08-30T18:59:13.069091Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
reranking-tei-xeon-server  | [2024-08-30 18:59:16,271] [    INFO] - Base service - Setting up HTTP server
reranking-tei-xeon-server  | [2024-08-30 18:59:16,272] [    INFO] - Base service - Uvicorn server setup on port 8000
tei-embedding-server       | 2024-08-30T18:59:13.138839Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server       | 2024-08-30T18:59:13.138853Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-embedding-server       | 2024-08-30T18:59:13.166225Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-embedding-server       | 2024-08-30T18:59:13.242409Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-embedding-server       | 2024-08-30T18:59:13.261180Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
tei-embedding-server       | 2024-08-30T18:59:13.261201Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
reranking-tei-xeon-server  | INFO:     Waiting for application startup.
tei-embedding-server       | 2024-08-30T18:59:14.089828Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 950.976918ms
tei-embedding-server       | 2024-08-30T18:59:14.102507Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tgi-service                | 2024-08-30T18:59:12.959781Z  INFO text_generation_launcher: Args {
tgi-service                |     model_id: "Intel/neural-chat-7b-v3-3",
tei-embedding-server       | 2024-08-30T18:59:14.102781Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-embedding-server       | 2024-08-30T18:59:14.144519Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-embedding-server       | 2024-08-30T18:59:15.160364Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-embedding-server       | 2024-08-30T18:59:15.160393Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
reranking-tei-xeon-server  | INFO:     Application startup complete.
reranking-tei-xeon-server  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server  | [2024-08-30 18:59:16,280] [    INFO] - Base service - HTTP server setup successful
tgi-service                |     revision: None,
tgi-service                |     validation_workers: 2,
tgi-service                |     sharded: None,
tgi-service                |     num_shard: None,
tgi-service                |     quantize: None,
tgi-service                |     speculate: None,
chatqna-xeon-ui-server     | 
chatqna-xeon-ui-server     | > sveltekit-auth-example@0.0.1 preview
tei-embedding-server       | 2024-08-30T18:59:15.160518Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-embedding-server       | 2024-08-30T18:59:15.162024Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server       | 2024-08-30T18:59:15.162038Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-embedding-server       | 2024-08-30T19:03:19.614297Z  INFO embed{total_time="10.586548ms" tokenization_time="185.813µs" queue_time="499.042µs" inference_time="9.826847ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
chatqna-xeon-ui-server     | > vite preview --port 5173 --host 0.0.0.0
chatqna-xeon-ui-server     | 
chatqna-xeon-ui-server     | 
chatqna-xeon-ui-server     |   ➜  Local:   http://localhost:5173/
chatqna-xeon-ui-server     |   ➜  Network: http://172.18.0.12:5173/
redis-vector-db            | 9:C 30 Aug 2024 18:59:12.941 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-vector-db            | 9:C 30 Aug 2024 18:59:12.941 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-vector-db            | 9:C 30 Aug 2024 18:59:12.941 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=9, just started
redis-vector-db            | 9:C 30 Aug 2024 18:59:12.941 * Configuration loaded
redis-vector-db            | 9:M 30 Aug 2024 18:59:12.941 * monotonic clock: POSIX clock_gettime
tgi-service                |     dtype: None,
tgi-service                |     trust_remote_code: false,
tgi-service                |     max_concurrent_requests: 128,
tgi-service                |     max_best_of: 2,
tgi-service                |     max_stop_sequences: 4,
embedding-tei-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
tei-reranking-server       | 2024-08-30T18:59:13.171889Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
redis-vector-db            | 9:M 30 Aug 2024 18:59:12.942 * Running mode=standalone, port=6379.
redis-vector-db            | 9:M 30 Aug 2024 18:59:12.942 * Module 'RedisCompat' loaded from /opt/redis-stack/lib/rediscompat.so
redis-vector-db            | 9:M 30 Aug 2024 18:59:12.944 * <search> Redis version found by RedisSearch : 7.2.4 - oss
embedding-tei-server       |   warnings.warn(
embedding-tei-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
embedding-tei-server       | 
embedding-tei-server       | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
embedding-tei-server       |   warnings.warn(
embedding-tei-server       | [2024-08-30 18:59:16,190] [    INFO] - Base service - CORS is enabled.
embedding-tei-server       | [2024-08-30 18:59:16,191] [    INFO] - Base service - Setting up HTTP server
embedding-tei-server       | [2024-08-30 18:59:16,191] [    INFO] - Base service - Uvicorn server setup on port 6000
embedding-tei-server       | INFO:     Waiting for application startup.
retriever-redis-server     | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
embedding-tei-server       | INFO:     Application startup complete.
retriever-redis-server     |   warnings.warn(
retriever-redis-server     | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
retriever-redis-server     | 
retriever-redis-server     | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
retriever-redis-server     |   warnings.warn(
retriever-redis-server     | [2024-08-30 18:59:16,210] [    INFO] - Base service - CORS is enabled.
retriever-redis-server     | [2024-08-30 18:59:16,211] [    INFO] - Base service - Setting up HTTP server
retriever-redis-server     | [2024-08-30 18:59:16,212] [    INFO] - Base service - Uvicorn server setup on port 7000
tei-reranking-server       | 2024-08-30T18:59:13.225855Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-reranking-server       | 2024-08-30T18:59:13.582628Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tgi-service                |     max_top_n_tokens: 5,
llm-tgi-server             | Defaulting to user installation because normal site-packages is not writeable
llm-tgi-server             | Collecting langserve (from -r requirements-runtime.txt (line 1))
tgi-service                |     max_input_tokens: None,
tgi-service                |     max_input_length: None,
tgi-service                |     max_total_tokens: None,
tgi-service                |     waiting_served_ratio: 0.3,
tgi-service                |     max_batch_prefill_tokens: None,
tgi-service                |     max_batch_total_tokens: None,
tgi-service                |     max_waiting_tokens: 20,
tgi-service                |     max_batch_size: None,
tgi-service                |     cuda_graphs: Some(
tei-reranking-server       | 2024-08-30T18:59:13.602241Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-base/resolve/main/model.onnx)
tei-reranking-server       | 2024-08-30T18:59:13.602265Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-reranking-server       | 2024-08-30T18:59:14.691879Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.519996915s
tei-reranking-server       | 2024-08-30T18:59:15.270436Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
tei-reranking-server       | 2024-08-30T18:59:15.270455Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-reranking-server       | 2024-08-30T18:59:15.270766Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-reranking-server       | 2024-08-30T18:59:17.123235Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-reranking-server         | 2024-08-30T18:59:18.915992Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-reranking-server         | 2024-08-30T18:59:18.916011Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
chatqna-xeon-backend-server  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service                |         [
chatqna-xeon-backend-server  | 
tgi-service                  |             0,
llm-tgi-server             |   Downloading langserve-0.2.2-py3-none-any.whl.metadata (39 kB)
chatqna-xeon-backend-server  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
llm-tgi-server               | Requirement already satisfied: httpx>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.27.0)
llm-tgi-server               | Requirement already satisfied: langchain-core<0.3,>=0.1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.1.7)
tei-reranking-server         | 2024-08-30T18:59:18.916130Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
chatqna-xeon-backend-server  |   warnings.warn(
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,995] [    INFO] - Base service - CORS is enabled.
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,996] [    INFO] - Base service - Setting up HTTP server
chatqna-xeon-backend-server  | [2024-08-30 18:59:14,997] [    INFO] - Base service - Uvicorn server setup on port 8888
chatqna-xeon-backend-server  | INFO:     Waiting for application startup.
chatqna-xeon-backend-server  | INFO:     Application startup complete.
chatqna-xeon-backend-server  | INFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
chatqna-xeon-backend-server  | [2024-08-30 18:59:15,010] [    INFO] - Base service - HTTP server setup successful
llm-tgi-server               | Requirement already satisfied: orjson>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (3.10.7)
llm-tgi-server               | Requirement already satisfied: pydantic>=1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (2.5.3)
retriever-redis-server     | INFO:     Waiting for application startup.
tgi-service                  |         ],
retriever-redis-server       | INFO:     Application startup complete.
retriever-redis-server       | INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
retriever-redis-server       | [2024-08-30 18:59:16,214] [    INFO] - Base service - HTTP server setup successful
dataprep-redis-server      | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
dataprep-redis-server        | 
dataprep-redis-server        | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server        |   warnings.warn(
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
dataprep-redis-server        |   warnings.warn(
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-redis-server        |   warnings.warn(
dataprep-redis-server        | [2024-08-30 18:59:17,957] [    INFO] - Base service - CORS is enabled.
tgi-service                  |     ),
tgi-service                  |     hostname: "a0a208e32895",
tgi-service                  |     port: 80,
tgi-service                  |     shard_uds_path: "/tmp/text-generation-server",
tgi-service                  |     master_addr: "localhost",
tgi-service                  |     master_port: 29500,
llm-tgi-server               | Collecting pyproject-toml<0.0.11,>=0.0.10 (from langserve->-r requirements-runtime.txt (line 1))
redis-vector-db            | 9:M 30 Aug 2024 18:59:12.944 * <search> RediSearch version 2.8.12 (Git=2.8-32fdaca)
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Low level api version 1 initialized successfully
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results:  10000, search pool size: 20, index pool size: 8, 
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Initialized thread pools!
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * <search> Enabled role change notification
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.944 * Module 'search' loaded from /opt/redis-stack/lib/redisearch.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> RedisTimeSeries version 11011, git_sha=0299ac12a6bf298028859c41ba0f4d8dc842726b
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Redis version found by RedisTimeSeries : 7.2.4 - oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded default CHUNK_SIZE_BYTES policy: 4096
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> loaded server DUPLICATE_POLICY: block
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Setting default series ENCODING to: compressed
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * <timeseries> Detected redis oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.945 * Module 'timeseries' loaded from /opt/redis-stack/lib/redistimeseries.so
tei-reranking-server         | 2024-08-30T18:59:18.917526Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tgi-service                  |     huggingface_hub_cache: Some(
embedding-tei-server       | INFO:     Uvicorn running on http://0.0.0.0:6000 (Press CTRL+C to quit)
embedding-tei-server         | [2024-08-30 18:59:16,194] [    INFO] - Base service - HTTP server setup successful
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Created new data type 'ReJSON-RL'
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> version: 20609 git sha: unknown branch: unknown
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V1 API
tgi-service                  |         "/data",
tgi-service                  |     ),
tgi-service                  |     weights_cache_override: None,
tgi-service                  |     disable_custom_kernels: false,
tgi-service                  |     cuda_memory_fraction: 1.0,
tgi-service                  |     rope_scaling: None,
tgi-service                  |     rope_factor: None,
dataprep-redis-server        | [2024-08-30 18:59:17,958] [    INFO] - Base service - Setting up HTTP server
llm-tgi-server               |   Downloading pyproject_toml-0.0.10-py3-none-any.whl.metadata (642 bytes)
llm-tgi-server               | Requirement already satisfied: anyio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (4.2.0)
llm-tgi-server               | Requirement already satisfied: certifi in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (2023.11.17)
llm-tgi-server               | Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.0.5)
llm-tgi-server               | Requirement already satisfied: idna in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (3.6)
llm-tgi-server               | Requirement already satisfied: sniffio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.3.0)
llm-tgi-server               | Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (0.14.0)
llm-tgi-server               | Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (6.0.1)
llm-tgi-server               | Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (1.33)
llm-tgi-server               | Requirement already satisfied: langsmith<0.1.0,>=0.0.63 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (0.0.77)
llm-tgi-server               | Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (23.2)
llm-tgi-server               | Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.31.0)
llm-tgi-server               | Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (8.2.3)
llm-tgi-server               | Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (0.6.0)
llm-tgi-server               | Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (2.14.6)
llm-tgi-server               | Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (4.9.0)
llm-tgi-server               | Requirement already satisfied: setuptools>=42 in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (65.5.1)
llm-tgi-server               | Requirement already satisfied: wheel in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.42.0)
llm-tgi-server               | Collecting toml (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1))
llm-tgi-server               |   Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
llm-tgi-server               | Requirement already satisfied: jsonschema in /home/user/.local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (4.23.0)
llm-tgi-server               | Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.4)
llm-tgi-server               | Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (3.3.2)
llm-tgi-server               | Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.1.0)
llm-tgi-server               | Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (23.2.0)
llm-tgi-server               | Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (2023.12.1)
llm-tgi-server               | Requirement already satisfied: referencing>=0.28.4 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.35.1)
llm-tgi-server               | Requirement already satisfied: rpds-py>=0.7.1 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.20.0)
llm-tgi-server               | Downloading langserve-0.2.2-py3-none-any.whl (1.2 MB)
llm-tgi-server               |    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 114.0 MB/s eta 0:00:00
llm-tgi-server               | Downloading pyproject_toml-0.0.10-py3-none-any.whl (6.9 kB)
llm-tgi-server               | Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
llm-tgi-server               | Installing collected packages: toml, pyproject-toml, langserve
llm-tgi-server               | Successfully installed langserve-0.2.2 pyproject-toml-0.0.10 toml-0.10.2
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service                  |     json_output: false,
tgi-service                  |     otlp_endpoint: None,
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V2 API
tgi-service                  |     otlp_service_name: "text-generation-inference.router",
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V3 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V4 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Exported RedisJSON_V5 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <ReJSON> Enabled diskless replication
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * Module 'ReJSON' loaded from /opt/redis-stack/lib/rejson.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <search> Acquired RedisJSON_V5 API
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <bf> RedisBloom version 2.6.12 (Git=unknown)
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * Module 'bf' loaded from /opt/redis-stack/lib/redisbloom.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Created new data type 'GearsType'
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.946 * <redisgears_2> Detected redis oss
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 # <redisgears_2> could not initialize RedisAI_InitError
redis-vector-db              | 
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Failed loading RedisAI API.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> RedisGears v2.0.19, sha='671030bbcb7de4582d00575a0902f826da3efe73', build_type='release', built_for='Linux-ubuntu22.04.x86_64'.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * <redisgears_2> Registered backend: js.
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.947 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.948 * Server initialized
redis-vector-db              | 9:M 30 Aug 2024 18:59:12.948 * Ready to accept connections tcp
dataprep-redis-server        | [2024-08-30 18:59:17,959] [    INFO] - Base service - Uvicorn server setup on port 6007
dataprep-redis-server        | INFO:     Waiting for application startup.
tei-reranking-server         | 2024-08-30T18:59:18.917532Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
retriever-redis-server       | /home/user/.local/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceHubEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEndpointEmbeddings`.
retriever-redis-server       |   warn_deprecated(
embedding-tei-server         | [2024-08-30 18:59:16,248] [    INFO] - embedding_tei_langchain - TEI Gaudi Embedding initialized.
embedding-tei-server         | INFO:     172.31.37.13:48430 - "POST /v1/embeddings HTTP/1.1" 200 OK
dataprep-redis-server        | INFO:     Application startup complete.
dataprep-redis-server        | INFO:     Uvicorn running on http://0.0.0.0:6007 (Press CTRL+C to quit)
dataprep-redis-server        | [2024-08-30 18:59:17,961] [    INFO] - Base service - HTTP server setup successful
tgi-service                  |     cors_allow_origin: [],
llm-tgi-server               | 
llm-tgi-server               | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | [2024-08-30 18:59:16,096] [    INFO] - Base service - CORS is enabled.
llm-tgi-server               | [2024-08-30 18:59:16,097] [    INFO] - Base service - Setting up HTTP server
llm-tgi-server               | [2024-08-30 18:59:16,097] [    INFO] - Base service - Uvicorn server setup on port 9000
llm-tgi-server               | INFO:     Waiting for application startup.
llm-tgi-server               | INFO:     Application startup complete.
llm-tgi-server               | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
llm-tgi-server               | [2024-08-30 18:59:16,100] [    INFO] - Base service - HTTP server setup successful
retriever-redis-server       | INFO:     172.31.37.13:43390 - "POST /v1/retrieval HTTP/1.1" 200 OK
tgi-service                  |     api_key: None,
tgi-service                  |     watermark_gamma: None,
tgi-service                  |     watermark_delta: None,
tgi-service                  |     ngrok: false,
tgi-service                  |     ngrok_authtoken: None,
tgi-service                  |     ngrok_edge: None,
tgi-service                  |     tokenizer_config_path: None,
tgi-service                  |     disable_grammar_support: false,
tgi-service                  |     env: false,
tgi-service                  |     max_client_batch_size: 4,
tgi-service                  |     lora_adapters: None,
tgi-service                  |     usage_stats: On,
tgi-service                  | }
tgi-service                  | 2024-08-30T18:59:12.959986Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"    
tgi-service                  | 2024-08-30T18:59:13.012501Z  INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
tgi-service                  | 2024-08-30T18:59:13.012520Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
tgi-service                  | 2024-08-30T18:59:13.012522Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-service                  | 2024-08-30T18:59:13.012523Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-service                  | 2024-08-30T18:59:13.012616Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
tgi-service                  | 2024-08-30T18:59:17.123445Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Downloading PyTorch weights.
tgi-service                  | 2024-08-30T18:59:17.156988Z  INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
tgi-service                  | 2024-08-30T18:59:46.974574Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00001-of-00002.bin in 0:00:29.
tgi-service                  | 2024-08-30T18:59:46.974598Z  INFO text_generation_launcher: Download: [1/2] -- ETA: 0:00:29
tgi-service                  | 2024-08-30T18:59:46.974779Z  INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
tgi-service                  | 2024-08-30T19:00:17.058207Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:30.
tgi-service                  | 2024-08-30T19:00:17.058225Z  INFO text_generation_launcher: Download: [2/2] -- ETA: 0
tgi-service                  | 2024-08-30T19:00:17.058238Z  WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi-service                  | 2024-08-30T19:00:17.058243Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service                  | Error: DownloadError
tgi-service                  | 2024-08-30T19:01:00.144114Z ERROR download: text_generation_launcher: Download encountered an error: 
tgi-service                  | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
tgi-service                  | 2024-08-30 18:59:16.457 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
tgi-service                  | /opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
tgi-service                  |   warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
tgi-service                  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
tgi-service                  | │ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:324 in │
tgi-service                  | │ download_weights                                                             │
tgi-service                  | │                                                                              ��
tgi-service                  | │   321 │   │   except Exception:                                              │
tgi-service                  | │   322 │   │   │   discard_names = []                                         │
tgi-service                  | │   323 │   │   # Convert pytorch weights to safetensors                       │
tgi-service                  | │ ❱ 324 │   │   utils.convert_files(local_pt_files, local_st_files, discard_na │
tgi-service                  | │   325                                                                        │
tgi-service                  | │   326                                                                        │
tgi-service                  | │   327 @app.command()                                                         │
tgi-service                  | │                                                                              ���
tgi-service                  | │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
tgi-service                  | │ │      architecture = 'MistralForCausalLM'                                 │ │
tgi-service                  | │ │      auto_convert = True                                                 │ │
tgi-service                  | │ │     base_model_id = None                                                 │ │
tgi-service                  | │ │            class_ = <class                                               │ │
tgi-service                  | │ │                     'transformers.models.mistral.modeling_mistral.Mistr… │ │
tgi-service                  | │ │            config = {                                                    │ │
tgi-service                  | │ │                     │   '_name_or_path': './neural-chat-7b-v3-9',        │ │
tgi-service                  | │ │                     │   'architectures': ['MistralForCausalLM'],         │ │
tgi-service                  | │ │                     │   'bos_token_id': 1,                               │ │
tgi-service                  | │ │                     │   'eos_token_id': 2,                               │ │
tgi-service                  | │ │                     │   'hidden_act': 'silu',                            │ │
tgi-service                  | │ │                     │   'hidden_size': 4096,                             │ │
tgi-service                  | │ │                     │   'initializer_range': 0.02,                       │ │
tgi-service                  | │ │                     │   'intermediate_size': 14336,                      │ │
tgi-service                  | │ │                     │   'max_position_embeddings': 32768,                │ │
tgi-service                  | │ │                     │   'model_type': 'mistral',                         │ │
tgi-service                  | │ │                     │   ... +11                                          │ │
tgi-service                  | │ │                     }                                                    │ │
tgi-service                  | │ │   config_filename = '/data/models--Intel--neural-chat-7b-v3-3/snapshots… │ │
tgi-service                  | │ │     discard_names = ['lm_head.weight']                                   │ │
tgi-service                  | │ │         extension = '.safetensors'                                       │ │
tgi-service                  | │ │                 f = <_io.TextIOWrapper                                   │ │
tgi-service                  | │ │                     name='/data/models--Intel--neural-chat-7b-v3-3/snap… │ │
tgi-service                  | │ │                     mode='r' encoding='UTF-8'>                           │ │
tgi-service                  | │ │    is_local_model = False                                                │ │
tgi-service                  | │ │              json = <module 'json' from                                  │ │
tgi-service                  | │ │                     '/opt/conda/lib/python3.10/json/__init__.py'>        │ │
tgi-service                  | │ │       json_output = True                                                 │ │
tgi-service                  | │ │    local_pt_files = [                                                    │ │
tgi-service                  | │ │                     │                                                    │ │
tgi-service                  | │ │                     PosixPath('/data/models--Intel--neural-chat-7b-v3-3… │ │
tgi-service                  | │ │                     │                                                    │ │
tgi-service                  | │ │                     PosixPath('/data/models--Intel--neural-chat-7b-v3-3… │ │
tgi-service                  | │ │                     ]                                                    │ │
tgi-service                  | │ │    local_st_files = [                                                    │ │
tgi-service                  | │ │                     │                                                    │ │
error from daemon in stream: Error grabbing logs: unexpected EOF

@letonghan
Copy link
Collaborator

Hi @arun-gupta , the chatqna pipeline including tgi service can be started successfully on our xeon server.
image

The root cause of your issue is The cache for model files in Transformers v4.22.0.
Please check the error messages below, remove the cache and try again.

2024-08-30T19:00:17.058243Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service | Error: DownloadError
tgi-service | 2024-08-30T19:01:00.144114Z ERROR download: text_generation_launcher: Download encountered an error:
tgi-service | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().

@arun-gupta
Copy link
Contributor Author

@letonghan my steps are available at https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7. I can only use the published Docker images. What should I do differently?

Tried with v0.8 of Docker images and got a similar error. Here are detailed logs:

ubuntu@ip-172-31-73-49:~$ sudo docker compose logs
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. 
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. 
chatqna-xeon-backend-server  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
chatqna-xeon-backend-server  | 
chatqna-xeon-backend-server  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
chatqna-xeon-backend-server  |   warnings.warn(
chatqna-xeon-backend-server  | [2024-09-03 16:35:14,437] [    INFO] - Base service - CORS is enabled.
chatqna-xeon-backend-server  | [2024-09-03 16:35:14,438] [    INFO] - Base service - Setting up HTTP server
chatqna-xeon-backend-server  | [2024-09-03 16:35:14,438] [    INFO] - Base service - Uvicorn server setup on port 8888
chatqna-xeon-backend-server  | INFO:     Waiting for application startup.
chatqna-xeon-backend-server  | INFO:     Application startup complete.
chatqna-xeon-backend-server  | INFO:     Uvicorn running on http://0.0.0.0:8888 (Press CTRL+C to quit)
chatqna-xeon-backend-server  | [2024-09-03 16:35:14,447] [    INFO] - Base service - HTTP server setup successful
tgi-service                  | 2024-09-03T16:35:12.583742Z  INFO text_generation_launcher: Args {
tgi-service                  |     model_id: "Intel/neural-chat-7b-v3-3",
tgi-service                  |     revision: None,
tgi-service                  |     validation_workers: 2,
tgi-service                  |     sharded: None,
tgi-service                  |     num_shard: None,
tgi-service                  |     quantize: None,
tgi-service                  |     speculate: None,
tgi-service                  |     dtype: None,
tgi-service                  |     trust_remote_code: false,
tgi-service                  |     max_concurrent_requests: 128,
tgi-service                  |     max_best_of: 2,
tgi-service                  |     max_stop_sequences: 4,
tgi-service                  |     max_top_n_tokens: 5,
tgi-service                  |     max_input_tokens: None,
retriever-redis-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
retriever-redis-server       |   warnings.warn(
retriever-redis-server       | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
retriever-redis-server       | 
tgi-service                  |     max_input_length: None,
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
tgi-service                  |     max_total_tokens: None,
tgi-service                  |     waiting_served_ratio: 0.3,
tgi-service                  |     max_batch_prefill_tokens: None,
tgi-service                  |     max_batch_total_tokens: None,
tgi-service                  |     max_waiting_tokens: 20,
tgi-service                  |     max_batch_size: None,
tgi-service                  |     cuda_graphs: Some(
tgi-service                  |         [
tgi-service                  |             0,
tgi-service                  |         ],
tgi-service                  |     ),
tgi-service                  |     hostname: "1b133a5060d8",
tgi-service                  |     port: 80,
tgi-service                  |     shard_uds_path: "/tmp/text-generation-server",
retriever-redis-server       | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server        | 
retriever-redis-server       |   warnings.warn(
embedding-tei-server         | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
redis-vector-db              | 9:C 03 Sep 2024 16:35:12.563 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis-vector-db              | 9:C 03 Sep 2024 16:35:12.563 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-vector-db              | 9:C 03 Sep 2024 16:35:12.563 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=9, just started
redis-vector-db              | 9:C 03 Sep 2024 16:35:12.563 * Configuration loaded
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.564 * monotonic clock: POSIX clock_gettime
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.564 * Running mode=standalone, port=6379.
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.564 * Module 'RedisCompat' loaded from /opt/redis-stack/lib/rediscompat.so
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.565 * <search> Redis version found by RedisSearch : 7.2.4 - oss
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.565 * <search> RediSearch version 2.8.12 (Git=2.8-32fdaca)
chatqna-xeon-ui-server       | 
chatqna-xeon-ui-server       | > sveltekit-auth-example@0.0.1 preview
chatqna-xeon-ui-server       | > vite preview --port 5173 --host 0.0.0.0
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.565 * <search> Low level api version 1 initialized successfully
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.565 * <search> concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 1000000, max number of search results:  10000, search pool size: 20, index pool size: 8, 
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.566 * <search> Initialized thread pools!
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.566 * <search> Enabled role change notification
tei-reranking-server         | 2024-09-03T16:35:12.584082Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "429aafe43aba", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
llm-tgi-server               | Defaulting to user installation because normal site-packages is not writeable
llm-tgi-server               | Collecting langserve (from -r requirements-runtime.txt (line 1))
tei-reranking-server         | 2024-09-03T16:35:12.584166Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
tei-reranking-server         | 2024-09-03T16:35:12.644558Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
chatqna-xeon-ui-server       | 
chatqna-xeon-ui-server       | 
chatqna-xeon-ui-server       |   ➜  Local:   http://localhost:5173/
chatqna-xeon-ui-server       |   ➜  Network: http://172.18.0.12:5173/
embedding-tei-server         |   warnings.warn(
embedding-tei-server         | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
reranking-tei-xeon-server    | /home/user/.local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:161: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
reranking-tei-xeon-server    | 
reranking-tei-xeon-server    | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server    |   warnings.warn(
reranking-tei-xeon-server    | [2024-09-03 16:35:16,214] [    INFO] - CORS is enabled.
reranking-tei-xeon-server    | [2024-09-03 16:35:16,215] [    INFO] - Setting up HTTP server
reranking-tei-xeon-server    | [2024-09-03 16:35:16,216] [    INFO] - Uvicorn server setup on port 8000
reranking-tei-xeon-server    | INFO:     Waiting for application startup.
embedding-tei-server         | 
llm-tgi-server               |   Downloading langserve-0.2.3-py3-none-any.whl.metadata (39 kB)
tei-reranking-server         | 2024-09-03T16:35:12.838269Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.566 * Module 'search' loaded from /opt/redis-stack/lib/redisearch.so
embedding-tei-server         | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
embedding-tei-server         |   warnings.warn(
embedding-tei-server         | [2024-09-03 16:35:16,068] [    INFO] - CORS is enabled.
tgi-service                  |     master_addr: "localhost",
embedding-tei-server         | [2024-09-03 16:35:16,069] [    INFO] - Setting up HTTP server
embedding-tei-server         | [2024-09-03 16:35:16,069] [    INFO] - Uvicorn server setup on port 6000
embedding-tei-server         | INFO:     Waiting for application startup.
llm-tgi-server               | Requirement already satisfied: httpx>=0.23.0 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.27.0)
llm-tgi-server               | Requirement already satisfied: langchain-core<0.3,>=0.1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (0.1.7)
dataprep-redis-server        | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
reranking-tei-xeon-server    | INFO:     Application startup complete.
dataprep-redis-server        |   warnings.warn(
reranking-tei-xeon-server    | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
reranking-tei-xeon-server    | [2024-09-03 16:35:16,219] [    INFO] - HTTP server setup successful
reranking-tei-xeon-server    | INFO:     172.31.73.49:47394 - "POST /v1/reranking HTTP/1.1" 200 OK
tgi-service                  |     master_port: 29500,
embedding-tei-server         | INFO:     Application startup complete.
tgi-service                  |     huggingface_hub_cache: Some(
embedding-tei-server         | INFO:     Uvicorn running on http://0.0.0.0:6000 (Press CTRL+C to quit)
tgi-service                  |         "/data",
retriever-redis-server       | [2024-09-03 16:35:16,011] [    INFO] - CORS is enabled.
embedding-tei-server         | [2024-09-03 16:35:16,078] [    INFO] - HTTP server setup successful
retriever-redis-server       | [2024-09-03 16:35:16,012] [    INFO] - Setting up HTTP server
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> RedisTimeSeries version 11011, git_sha=0299ac12a6bf298028859c41ba0f4d8dc842726b
tgi-service                  |     ),
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Redis version found by RedisTimeSeries : 7.2.4 - oss
retriever-redis-server       | [2024-09-03 16:35:16,013] [    INFO] - Uvicorn server setup on port 7000
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> loaded default CHUNK_SIZE_BYTES policy: 4096
retriever-redis-server       | INFO:     Waiting for application startup.
retriever-redis-server       | INFO:     Application startup complete.
retriever-redis-server       | INFO:     Uvicorn running on http://0.0.0.0:7000 (Press CTRL+C to quit)
retriever-redis-server       | [2024-09-03 16:35:16,021] [    INFO] - HTTP server setup successful
retriever-redis-server       | INFO:     172.31.73.49:50014 - "POST /v1/retrieval HTTP/1.1" 200 OK
llm-tgi-server               | Requirement already satisfied: orjson>=2 in /home/user/.local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (3.10.7)
tei-embedding-server         | 2024-09-03T16:35:12.584463Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "b562c4d7638f", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
tei-embedding-server         | 2024-09-03T16:35:12.584558Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
llm-tgi-server               | Requirement already satisfied: pydantic>=1 in /usr/local/lib/python3.11/site-packages (from langserve->-r requirements-runtime.txt (line 1)) (2.5.3)
embedding-tei-server         | TEI Gaudi Embedding initialized.
embedding-tei-server         | INFO:     172.31.73.49:47900 - "POST /v1/embeddings HTTP/1.1" 200 OK
tgi-service                  |     weights_cache_override: None,
tei-embedding-server         | 2024-09-03T16:35:12.636839Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
tei-embedding-server         | 2024-09-03T16:35:12.732054Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
tei-embedding-server         | 2024-09-03T16:35:12.763631Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-embedding-server         | 2024-09-03T16:35:12.763643Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-embedding-server         | 2024-09-03T16:35:12.809072Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-embedding-server         | 2024-09-03T16:35:12.850503Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-embedding-server         | 2024-09-03T16:35:12.866721Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/model.onnx)
tei-embedding-server         | 2024-09-03T16:35:12.866734Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-embedding-server         | 2024-09-03T16:35:14.488973Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 1.725340147s
tei-embedding-server         | 2024-09-03T16:35:14.500268Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-embedding-server         | 2024-09-03T16:35:14.500593Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-embedding-server         | 2024-09-03T16:35:14.539989Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-embedding-server         | 2024-09-03T16:35:15.530685Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tgi-service                  |     disable_custom_kernels: false,
tgi-service                  |     cuda_memory_fraction: 1.0,
tgi-service                  |     rope_scaling: None,
tgi-service                  |     rope_factor: None,
tgi-service                  |     json_output: false,
tgi-service                  |     otlp_endpoint: None,
tgi-service                  |     otlp_service_name: "text-generation-inference.router",
tgi-service                  |     cors_allow_origin: [],
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing LLMChain from langchain root module is no longer supported. Please use langchain.chains.LLMChain instead.
llm-tgi-server               | Collecting pyproject-toml<0.0.11,>=0.0.10 (from langserve->-r requirements-runtime.txt (line 1))
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> loaded server DUPLICATE_POLICY: block
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Setting default series ENCODING to: compressed
tei-reranking-server         | 2024-09-03T16:35:12.859642Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
tei-reranking-server         | 2024-09-03T16:35:12.859654Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
tei-reranking-server         | 2024-09-03T16:35:12.901154Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
tei-reranking-server         | 2024-09-03T16:35:13.064387Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
tei-reranking-server         | 2024-09-03T16:35:13.081150Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-base/resolve/main/model.onnx)
tei-reranking-server         | 2024-09-03T16:35:13.081171Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
tei-reranking-server         | 2024-09-03T16:35:16.162648Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 3.303005165s
tei-reranking-server         | 2024-09-03T16:35:16.639814Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <timeseries> Detected redis oss
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * Module 'timeseries' loaded from /opt/redis-stack/lib/redistimeseries.so
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> Created new data type 'ReJSON-RL'
dataprep-redis-server        |   warnings.warn(
tei-reranking-server         | 2024-09-03T16:35:16.639830Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
tei-reranking-server         | 2024-09-03T16:35:16.640052Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
tei-reranking-server         | 2024-09-03T16:35:18.474694Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
tei-reranking-server         | 2024-09-03T16:35:20.263187Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
tei-reranking-server         | 2024-09-03T16:35:20.263205Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> version: 20609 git sha: unknown branch: unknown
tei-embedding-server         | 2024-09-03T16:35:15.530759Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
tgi-service                  |     api_key: None,
tgi-service                  |     watermark_gamma: None,
tgi-service                  |     watermark_delta: None,
tei-reranking-server         | 2024-09-03T16:35:20.263314Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-reranking-server         | 2024-09-03T16:35:20.264715Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-reranking-server         | 2024-09-03T16:35:20.264722Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-reranking-server         | 2024-09-03T16:41:33.132290Z  INFO rerank{total_time="20.070554ms" tokenization_time="576.473µs" queue_time="891.339µs" inference_time="18.483195ms"}: text_embeddings_router::http::server: router/src/http/server.rs:455: Success
tei-reranking-server         | 2024-09-03T16:41:45.597461Z  INFO rerank{total_time="25.077603ms" tokenization_time="288.427µs" queue_time="6.776727ms" inference_time="12.031357ms"}: text_embeddings_router::http::server: router/src/http/server.rs:455: Success
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.567 * <ReJSON> Exported RedisJSON_V1 API
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V2 API
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V3 API
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V4 API
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Exported RedisJSON_V5 API
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <ReJSON> Enabled diskless replication
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * Module 'ReJSON' loaded from /opt/redis-stack/lib/rejson.so
dataprep-redis-server        | /home/user/.local/lib/python3.11/site-packages/langchain/__init__.py:30: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. Please use langchain_core.prompts.PromptTemplate instead.
dataprep-redis-server        |   warnings.warn(
dataprep-redis-server        | [2024-09-03 16:35:17,740] [    INFO] - CORS is enabled.
llm-tgi-server               |   Downloading pyproject_toml-0.0.10-py3-none-any.whl.metadata (642 bytes)
llm-tgi-server               | Requirement already satisfied: anyio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (4.2.0)
dataprep-redis-server        | [2024-09-03 16:35:17,741] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,742] [    INFO] - Uvicorn server setup on port 6007
dataprep-redis-server        | INFO:     Waiting for application startup.
dataprep-redis-server        | INFO:     Application startup complete.
dataprep-redis-server        | INFO:     Uvicorn running on http://0.0.0.0:6007 (Press CTRL+C to quit)
dataprep-redis-server        | [2024-09-03 16:35:17,744] [    INFO] - HTTP server setup successful
dataprep-redis-server        | [2024-09-03 16:35:17,751] [    INFO] - CORS is enabled.
dataprep-redis-server        | [2024-09-03 16:35:17,751] [    INFO] - CORS is enabled.
dataprep-redis-server        | [2024-09-03 16:35:17,752] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,752] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,752] [    INFO] - Uvicorn server setup on port 6008
dataprep-redis-server        | [2024-09-03 16:35:17,752] [    INFO] - Uvicorn server setup on port 6008
dataprep-redis-server        | INFO:     Waiting for application startup.
dataprep-redis-server        | INFO:     Application startup complete.
llm-tgi-server               | Requirement already satisfied: certifi in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (2023.11.17)
llm-tgi-server               | Requirement already satisfied: httpcore==1.* in /home/user/.local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.0.5)
llm-tgi-server               | Requirement already satisfied: idna in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (3.6)
llm-tgi-server               | Requirement already satisfied: sniffio in /usr/local/lib/python3.11/site-packages (from httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (1.3.0)
llm-tgi-server               | Requirement already satisfied: h11<0.15,>=0.13 in /home/user/.local/lib/python3.11/site-packages (from httpcore==1.*->httpx>=0.23.0->langserve->-r requirements-runtime.txt (line 1)) (0.14.0)
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <search> Acquired RedisJSON_V5 API
tei-embedding-server         | 2024-09-03T16:35:15.530888Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
tei-embedding-server         | 2024-09-03T16:35:15.532414Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:80
tei-embedding-server         | 2024-09-03T16:35:15.532453Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
tei-embedding-server         | 2024-09-03T16:39:35.319143Z  INFO embed{total_time="12.188539ms" tokenization_time="720.001µs" queue_time="640.359µs" inference_time="10.74443ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
tei-embedding-server         | 2024-09-03T16:41:11.040334Z  INFO embed{total_time="9.505833ms" tokenization_time="401.51µs" queue_time="499.94µs" inference_time="8.531757ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
tgi-service                  |     ngrok: false,
tgi-service                  |     ngrok_authtoken: None,
llm-tgi-server               | Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (6.0.1)
llm-tgi-server               | Requirement already satisfied: jsonpatch<2.0,>=1.33 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (1.33)
llm-tgi-server               | Requirement already satisfied: langsmith<0.1.0,>=0.0.63 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (0.0.77)
llm-tgi-server               | Requirement already satisfied: packaging<24.0,>=23.2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (23.2)
tgi-service                  |     ngrok_edge: None,
tgi-service                  |     tokenizer_config_path: None,
tgi-service                  |     disable_grammar_support: false,
tgi-service                  |     env: false,
tgi-service                  |     max_client_batch_size: 4,
tgi-service                  |     lora_adapters: None,
tgi-service                  |     usage_stats: On,
tgi-service                  | }
tgi-service                  | 2024-09-03T16:35:12.583910Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"    
llm-tgi-server               | Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.31.0)
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <bf> RedisBloom version 2.6.12 (Git=unknown)
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * Module 'bf' loaded from /opt/redis-stack/lib/redisbloom.so
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Created new data type 'GearsType'
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Detected redis oss
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 # <redisgears_2> could not initialize RedisAI_InitError
redis-vector-db              | 
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> Failed loading RedisAI API.
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.568 * <redisgears_2> RedisGears v2.0.19, sha='671030bbcb7de4582d00575a0902f826da3efe73', build_type='release', built_for='Linux-ubuntu22.04.x86_64'.
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.569 * <redisgears_2> Registered backend: js.
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.569 * Module 'redisgears_2' loaded from /opt/redis-stack/lib/redisgears.so
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.569 * Server initialized
redis-vector-db              | 9:M 03 Sep 2024 16:35:12.570 * Ready to accept connections tcp
tgi-service                  | 2024-09-03T16:35:12.637897Z  INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using `--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767`.
dataprep-redis-server        | INFO:     Uvicorn running on http://0.0.0.0:6008 (Press CTRL+C to quit)
llm-tgi-server               | Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.11/site-packages (from langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (8.2.3)
tgi-service                  | 2024-09-03T16:35:12.637917Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
llm-tgi-server               | Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (0.6.0)
tgi-service                  | 2024-09-03T16:35:12.637920Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
tgi-service                  | 2024-09-03T16:35:12.637922Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
tgi-service                  | 2024-09-03T16:35:12.638062Z  INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3
tgi-service                  | 2024-09-03T16:35:16.967885Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Downloading PyTorch weights.
tgi-service                  | 2024-09-03T16:35:16.999570Z  INFO text_generation_launcher: Download file: pytorch_model-00001-of-00002.bin
tgi-service                  | 2024-09-03T16:35:44.225639Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00001-of-00002.bin in 0:00:27.
tgi-service                  | 2024-09-03T16:35:44.225659Z  INFO text_generation_launcher: Download: [1/2] -- ETA: 0:00:27
tgi-service                  | 2024-09-03T16:35:44.225982Z  INFO text_generation_launcher: Download file: pytorch_model-00002-of-00002.bin
tgi-service                  | 2024-09-03T16:36:10.045527Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:25.
tgi-service                  | 2024-09-03T16:36:10.045548Z  INFO text_generation_launcher: Download: [2/2] -- ETA: 0
tgi-service                  | 2024-09-03T16:36:10.045563Z  WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi-service                  | 2024-09-03T16:36:10.045793Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi-service                  | Error: DownloadError
tgi-service                  | 2024-09-03T16:37:00.978463Z ERROR download: text_generation_launcher: Download encountered an error: 
tgi-service                  | The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
tgi-service                  | 2024-09-03 16:35:16.253 | INFO     | text_generation_server.utils.import_utils:<module>:75 - Detected system ipex
tgi-service                  | /opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
tgi-service                  |   warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
tgi-service                  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
tgi-service                  | │ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:324 in │
tgi-service                  | │ download_weights                                                             │
tgi-service                  | │                                                                              │
tgi-service                  | │   321 │   │   except Exception:                                              │
tgi-service                  | │   322 │   │   │   discard_names = []                                         │
tgi-service                  | │   323 │   │   # Convert pytorch weights to safetensors                       │
llm-tgi-server               | Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (2.14.6)
llm-tgi-server               | Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic>=1->langserve->-r requirements-runtime.txt (line 1)) (4.9.0)
llm-tgi-server               | Requirement already satisfied: setuptools>=42 in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (65.5.1)
llm-tgi-server               | Requirement already satisfied: wheel in /usr/local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.42.0)
llm-tgi-server               | Collecting toml (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1))
llm-tgi-server               |   Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
llm-tgi-server               | Requirement already satisfied: jsonschema in /home/user/.local/lib/python3.11/site-packages (from pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (4.23.0)
llm-tgi-server               | Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.11/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.4)
llm-tgi-server               | Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (3.3.2)
llm-tgi-server               | Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/site-packages (from requests<3,>=2->langchain-core<0.3,>=0.1->langserve->-r requirements-runtime.txt (line 1)) (2.1.0)
llm-tgi-server               | Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (23.2.0)
llm-tgi-server               | Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (2023.12.1)
llm-tgi-server               | Requirement already satisfied: referencing>=0.28.4 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.35.1)
llm-tgi-server               | Requirement already satisfied: rpds-py>=0.7.1 in /home/user/.local/lib/python3.11/site-packages (from jsonschema->pyproject-toml<0.0.11,>=0.0.10->langserve->-r requirements-runtime.txt (line 1)) (0.20.0)
llm-tgi-server               | Downloading langserve-0.2.3-py3-none-any.whl (1.2 MB)
llm-tgi-server               |    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 64.8 MB/s eta 0:00:00
llm-tgi-server               | Downloading pyproject_toml-0.0.10-py3-none-any.whl (6.9 kB)
llm-tgi-server               | Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
llm-tgi-server               | Installing collected packages: toml, pyproject-toml, langserve
llm-tgi-server               | Successfully installed langserve-0.2.3 pyproject-toml-0.0.10 toml-0.10.2
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:184: UserWarning: Field name "downstream_black_list" shadows an attribute in parent "TopologyInfo"; 
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_name_or_path" has conflict with protected namespace "model_".
llm-tgi-server               | 
llm-tgi-server               | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
dataprep-redis-server        | [2024-09-03 16:35:17,753] [    INFO] - HTTP server setup successful
dataprep-redis-server        | [2024-09-03 16:35:17,753] [    INFO] - HTTP server setup successful
dataprep-redis-server        | [2024-09-03 16:35:17,753] [    INFO] - CORS is enabled.
dataprep-redis-server        | [2024-09-03 16:35:17,753] [    INFO] - CORS is enabled.
dataprep-redis-server        | [2024-09-03 16:35:17,753] [    INFO] - CORS is enabled.
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Setting up HTTP server
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Uvicorn server setup on port 6009
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Uvicorn server setup on port 6009
dataprep-redis-server        | [2024-09-03 16:35:17,754] [    INFO] - Uvicorn server setup on port 6009
dataprep-redis-server        | INFO:     Waiting for application startup.
dataprep-redis-server        | INFO:     Application startup complete.
dataprep-redis-server        | INFO:     Uvicorn running on http://0.0.0.0:6009 (Press CTRL+C to quit)
dataprep-redis-server        | [2024-09-03 16:35:17,755] [    INFO] - HTTP server setup successful
tgi-service                  | │ ❱ 324 │   │   utils.convert_files(local_pt_files, local_st_files, discard_na │
tgi-service                  | │   325                                                                        │
tgi-service                  | │   326                                                                        │
tgi-service                  | │   327 @app.command()                                                         │
tgi-service                  | │                                                                              │
tgi-service                  | │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
tgi-service                  | │ │      architecture = 'MistralForCausalLM'                                 │ │
tgi-service                  | │ │      auto_convert = True                                                 │ │
tgi-service                  | │ │     base_model_id = None                                                 │ │
tgi-service                  | │ │            class_ = <class                                               │ │
tgi-service                  | │ │                     'transformers.models.mistral.modeling_mistral.Mistr… │ │
tgi-service                  | │ │            config = {                                                    │ │
tgi-service                  | │ │                     │   '_name_or_path': './neural-chat-7b-v3-9',        │ │
tgi-service                  | │ │                     │   'architectures': ['MistralForCausalLM'],         │ │
tgi-service                  | │ │                     │   'bos_token_id': 1,                               │ │
tgi-service                  | │ │                     │   'eos_token_id': 2,                               │ │
tgi-service                  | │ │                     │   'hidden_act': 'silu',                            │ │
llm-tgi-server               |   warnings.warn(
llm-tgi-server               | [2024-09-03 16:35:15,842] [    INFO] - CORS is enabled.
llm-tgi-server               | [2024-09-03 16:35:15,842] [    INFO] - Setting up HTTP server
llm-tgi-server               | [2024-09-03 16:35:15,843] [    INFO] - Uvicorn server setup on port 9000
llm-tgi-server               | INFO:     Waiting for application startup.
llm-tgi-server               | INFO:     Application startup complete.
llm-tgi-server               | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)
llm-tgi-server               | [2024-09-03 16:35:15,852] [    INFO] - HTTP server setup successful
dataprep-redis-server        | [2024-09-03 16:35:17,755] [    INFO] - HTTP server setup successful
dataprep-redis-server        | [2024-09-03 16:35:17,755] [    INFO] - HTTP server setup successful
tgi-service                  | │ │                     │   'hidden_size': 4096,                             │ │
tgi-service                  | │ │                     │   'initializer_range': 0.02,                       │ │
tgi-service                  | │ │                     │   'intermediate_size': 14336,                      │ │
tgi-service                  | │ │                     │   'max_position_embeddings': 32768,                │ │
tgi-service                  | │ │                     │   'model_type': 'mistral',                         │ │
tgi-service                  | │ │                     │   ... +11                                          │ │
tgi-service                  | │ │                     }                                                    │ │
tgi-service                  | │ │   config_filename = '/data/models--Intel--neural-chat-7b-v3-3/snapshots… │ │
tgi-service                  | │ │     discard_names = ['lm_head.weight']                                   │ │
tgi-service                  | │ │         extension = '.safetensors'                                       │ │
tgi-service                  | │ │                 f = <_io.TextIOWrapper                                   │ │
tgi-service                  | │ │                     name='/data/models--Intel--neural-chat-7b-v3-3/snap… │ │
tgi-service                  | │ │                     mode='r' encoding='UTF-8'>                           │ │
tgi-service                  | │ │    is_local_model = False                                                │ │
tgi-service                  | │ │              json = <module 'json' from                                  │ │
tgi-service                  | │ │                     '/opt/conda/lib/python3.10/json/__init__.py'>        │ │
tgi-service                  | │ │       json_output = True                                                 │ │
tgi-service                  | │ │    local_pt_files = [                                                    │ │
tgi-service                  | │ │                     │                                                    │ │
tgi-service                  | │ │                     PosixPath('/data/models--Intel--neural-chat-7b-v3-3… │ │
tgi-service                  | │ │                     │                                                    │ │
error from daemon in stream: Error grabbing logs: unexpected EOF

@arun-gupta
Copy link
Contributor Author

This error is only occurring with Ubuntu 24.04 on AWS. I tested with both 0.8 and 0.9 Docker images. It worked fine on Amazon Linux 2023 AMI.

@letonghan
Copy link
Collaborator

Hi @arun-gupta , since we don't have an AWS environment, currently it's hard to find out the root cause of this issue.
If you're willing to share your aws environment, we can help debug the problem!

@arun-gupta
Copy link
Contributor Author

@letonghan sure, let me set up a time with you offline.

@yinghu5 yinghu5 added the Dev label Sep 4, 2024
@arun-gupta
Copy link
Contributor Author

I tried this again on AWS Ubuntu 24.04 and it is working fine. It also worked with Ubuntu 24.04 on GCP with latest images. The GCP instructions are available at https://gist.github.com/arun-gupta/564c5334c62cf4ada3cbd3124a2defb7.

The bug can be closed.

@letonghan
Copy link
Collaborator

Ok, will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants