- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
I am running vLLM openAI engine v0.8.1 on kubernetes (see the example yaml below). After vllm is running, I tried to get the metrics via
kubectl port-forward <vllm pod> 8000
curl localhost:8000/metrics
And for vLLM V1, the num_gpu_blocks in cache_config_info is shown as None. After switching to vLLM V0, the metric shows the correct value. Is this a regression in V1?
# TYPE vllm:cache_config_info gauge
vllm:cache_config_info{block_size="16",cache_dtype="auto",calculate_kv_scales="False",cpu_offload_gb="0",enable_prefix_caching="True",gpu_memory_utilization="0.9",is_attention_free="False",num_cpu_blocks="None",num_gpu_blocks="None",num_gpu_blocks_override="None",sliding_window="None",swap_space_bytes="4294967296"} 1.0
 containers:
      - args:
        - --port
        - "8000"
        - --max-num-seqs
        - "2048"
        - --max_model_len
        - "4096"
        - --compilation-config
        - "3"
        - --tensor-parallel-size
        - "1"
        - --model
        - "meta-llama/Llama-2-7b-hf"
        - "--enable-lora"
        - "--max-loras"
        - "10"
        - "--max-cpu-loras"
        - "12"
        command:
        - python3
        - -m
        - vllm.entrypoints.openai.api_server
        env:
        - name: PORT
          value: "8000"
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              key: token
              name: hf-token
        - name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
          value: "true"
        - name: VLLM_USE_V1
          value: "1"
        image: vllm/vllm-openai:v0.8.1
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
 
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working