You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running the vLLM API server (v0.5.4) using Docker, the following error is encountered during initialization when trying to configure the otlp_traces_endpoint:
_ValueError: OpenTelemetry packages must be installed before configuring 'otlp_traces_endpoint'_
Docker logs
binishb.ttl@vzneuronsr01:~$ docker logs fc2b9c21e998
INFO 08-20 07:15:58 api_server.py:339] vLLM API server version 0.5.4
INFO 08-20 07:15:58 api_server.py:340] args: Namespace(host=None, port=8514, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, model='/root/Meta-Llama-3.1-8B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=64000, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=42, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint='grpc://xxxxxxxxxxx:4317', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 08-20 07:15:58 config.py:1454] Casting torch.bfloat16 to torch.float16.
WARNING 08-20 07:15:58 arg_utils.py:776] The model has a long context length (64000). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 217, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, port)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/server.py", line 25, in __init__
self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 462, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 852, in create_engine_config
observability_config = ObservabilityConfig(
File "<string>", line 4, in __init__
File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 1615, in __post_init__
raise ValueError("OpenTelemetry packages must be installed before "
ValueError: OpenTelemetry packages must be installed before configuring 'otlp_traces_endpoint'
Potential Fix:
It seems like the issue arises because the required OpenTelemetry packages are missing. A possible solution could be either:
Adding a check during startup to ensure the necessary packages are installed if the otlp_traces_endpoint is configured.
Automatically disabling observability features if the packages are not available, and logging a warning instead of raising an error.
vipulgote1999
changed the title
[Usage]: vLLM Docker | ValueError: OpenTelemetry packages must be installed before configuring 'otlp_traces_endpoint' during vLLM startup
[Usage, bug]: vLLM Docker | ValueError: OpenTelemetry packages must be installed before configuring 'otlp_traces_endpoint' during vLLM startup
Aug 20, 2024
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
How would you like to use vllm
Description: When spinning up new docker container it is showing error for missing package.
Docker run command:
docker run -d --runtime nvidia --gpus all -v ~/Vipul/nltk_data:/home/user/nltk_data -v /home/binishb.ttl/Meta-Llama-3.1-8B-Instruct/:/root/Meta-Llama-3.1-8B-Instruct --env "HUGGING_FACE_HUB_TOKEN=xxxxxxxxxxxxxxxxx" -p 8514:8514 --ipc=host --env "CUDA_VISIBLE_DEVICES=1" --entrypoint "python3" vllm/vllm-openai:v0.5.4 -m vllm.entrypoints.openai.api_server --model /root/Meta-Llama-3.1-8B-Instruct --gpu-memory-utilization 0.9 --port 8514 --max-model-len 64000 --seed 42 --otlp-traces-endpoint "grpc://xxxxxxxxxx:4317" --enable-prefix-caching
Error:
Docker logs
Potential Fix:
The text was updated successfully, but these errors were encountered: