[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

luoling1993 · 2024-12-11T07:38:41Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

VLLM version： 0.6.4.post1
I have trained a LoRA model based on Qwen2.5-7B-Instruct, and I have started the vllm service using pm2 with the following configuration:

apps:
  - name: "vllm"
    script: "/home/lucas/envs/nlp-vllm/bin/python"
    args:
      - "-m"
      - "vllm.entrypoints.openai.api_server"
      - "--port=18101"  # 端口设置

      # # Meta-Llama-3.1-8B-Instruct
      # - "--served-model-name=Meta-Llama-3.1-8B-Instruct"
      # - "--model=/data/llms/Meta-Llama-3.1-8B-Instruct"
      # - "--tokenizer=/data/llms/Meta-Llama-3.1-8B-Instruct"

      # qwen
      - "--served-model-name=Qwen2.5-7B-Instruct"
      - "--model=/data/llms/Qwen2.5-7B-Instruct"
      - "--tokenizer=/data/llms/Qwen2.5-7B-Instruct"

      # - "--max-model-len=8192"  # 最大模型长度
      - "--max-model-len=4096"
      - "--gpu-memory-utilization=0.9"  # GPU内存利用率

      # speedup
      # - "--enable-chunked-prefill"  # NOTE: LoRA is not supported with chunked prefill yet
      - "--enable-prefix-caching"
      # - "--num-scheduler-steps=8" # NOTE: LoRA, will always use base model(BUG)

      - "--enable-lora"
      - "--max-lora-rank=64"
      - "--lora-modules"
      # - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Meta-Llama-3.1-8B-Instruct/lora/nl2filter-all", "base_model_name": "Meta-Llama-3.1-8B-Instruct"}'
      - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Qwen2.5-7B-Instruct/lora/nl2filter-all", "base_model_name": "Qwen2.5-7B-Instruct"}'
   
    env:
      CUDA_VISIBLE_DEVICES: "0"

    log_date_format: "YYYY-MM-DD HH:mm:ss"
    error_file: "/home/lucas/workspace/pm2_logs/error.log"
    out_file: "/home/lucas/workspace/pm2_logs/out.log"

When calling, I use model_name=nl2filter. Everything works fine when the num-scheduler-steps parameter is not set. However, when setting --num-scheduler-steps=8, the service starts up normally and the call also returns, but the result returned is not from the LoRA model. Instead, it is from the base model, which is Qwen2.5-7B-Instruct without any LoRA modifications. There are no errors or warnings.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

jeejeelee · 2024-12-11T07:46:44Z

` --enable-chunked-prefill" # NOTE: LoRA is not supported with chunked prefill yet ·

Just supported, it should be reflected in the next version, see: #9057

jeejeelee · 2024-12-11T07:54:42Z

# - "--num-scheduler-steps=8" # NOTE: LoRA, will always use base model(BUG)

I remember there was a bug here, and there was a PR for it - let me look for it, see: #9689

…11161) FIX issue #9688 #11086 #12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Srikanth Srinivas <srikanth@astrum.ai>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Felix Marty <felmarty@amd.com>

…llm-project#11161) FIX issue vllm-project#9688 vllm-project#11086 vllm-project#12487 --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

luoling1993 added the bug Something isn't working label Dec 11, 2024

jeejeelee mentioned this issue Dec 13, 2024

[BugFix] fix wrong output when using lora and num_scheduler_steps=8 #9689

Closed

sleepwalker2017 mentioned this issue Dec 13, 2024

[BugFix] fix wrong output when using lora and num_scheduler_steps=8 #11161

Merged

jeejeelee closed this as completed Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

luoling1993 commented Dec 11, 2024 •

edited

Loading

jeejeelee commented Dec 11, 2024 •

edited

Loading

jeejeelee commented Dec 11, 2024 •

edited

Loading

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

Comments

luoling1993 commented Dec 11, 2024 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

jeejeelee commented Dec 11, 2024 • edited Loading

jeejeelee commented Dec 11, 2024 • edited Loading

luoling1993 commented Dec 11, 2024 •

edited

Loading

jeejeelee commented Dec 11, 2024 •

edited

Loading

jeejeelee commented Dec 11, 2024 •

edited

Loading