-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: --enable-lora
raises error while trying to start api_server
#405
Comments
--enable-lora
raises error --enable-lora
raises error while trying to start api_server
@JHLEE17 Thanks for raising this issue, we will check it immediately. Can you please share information about which SynapseAI release you are using to run this test? |
I executed on 1.17 version. But I got a similar error on 1.18 version. |
We seem to have a backward compatibility issue, #382 works on latest SynapseAI code (not released yet), but throws above error with 1.18.0 SynapseAI code. We will work on fixing this and get back asap. |
@michalkuligowski No issue occurs only with SynapseAI 1.18.0 + HabanaAI/vllm-fork (master_next branch). Whereas SynapseAI 1.19.0 (not released yet) + HabanaAI/vllm-fork (master_next branch) works ok. |
CUDA uses `capture` for warmup runs and `execute_model` for actual runs. During each phase they call `set_active_loras` only once. HPU uses `execute_model` for both warmup and actual runs. Since `execute_model` already takes care of `set_active_loras` internally, the redundant call can be removed. This special handling is redundant and incorrect, as it causes out-of-bound slicing in decode phase reported in #405. This PR removes special handling of `set_active_loras` function call from warmup runs and resolves the issue in #405.
CUDA uses `capture` for warmup runs and `execute_model` for actual runs. During each phase they call `set_active_loras` only once. HPU uses `execute_model` for both warmup and actual runs. Since `execute_model` already takes care of `set_active_loras` internally, the redundant call can be removed. This special handling is redundant and incorrect, as it causes out-of-bound slicing in decode phase reported in HabanaAI#405. This PR removes special handling of `set_active_loras` function call from warmup runs and resolves the issue in HabanaAI#405.
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I encountered an error while trying to start the api_server with Multi-LoRA.
The command I used is as follows:
(you can reproduce the error w/o last 3 lines in the command)
However, when I run this command, the following error occurs:
Error logs
The issue occurs with both commits 9276ccc(habana_main) and d6bd375(remove-lora-warmup-constraints). I've verified that the paths to the models are correct and that the models are accessible.
Any guidance on resolving this issue would be greatly appreciated.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: