-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] LoRA Support for Ultravox model #11253
[Model] LoRA Support for Ultravox model #11253
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hi folks, I'm working with @petersalas on this. PR is not complete but wanted to start the discussion as I have some open questions and need some help from vLLM community |
This pull request has merge conflicts that must be resolved before it can be |
771484d
to
64a664f
Compare
Signed-off-by: Sumit Vij <sumitvij11+github@gmail.com>
WIP: lora tests Minor tweaks Moar fixes Temp changes Cleanup Add more debugging logs and packed modules Signed-off-by: Sumit Vij <sumitvij11+github@gmail.com>
Remove stale comment Add llama lora modules Add llama test case Add test case and log warning on missing lora modules Rollback unwanted changes and format fixes Signed-off-by: Sumit Vij <sumitvij11+github@gmail.com>
64a664f
to
3f5996c
Compare
Can you refer to #10022 to minimize the changes? |
Changes are inline with 10022 except the test case and other minor logging changes. Do you have any concerns with any particular change? |
It looks like there are issues with both the added tests and logs. We should only modify the Ultravox scipt, following the changes made in the #10022 |
What is/are the issue(s)? Maybe I miss something but tests are passing |
@jeejeelee lmk what are your concerns please? Happy to address it. Having a test case was super helpful to make sure LoRA works as expected with llama and ultravox |
…-ultravox-lora-dec-16
…-ultravox-lora-dec-16
@thedebugger We should only modify |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the unit tests as they are for now. I'll look into how to train Ultravox LoRA and update in another PR
@jeejeelee I'll fix the test tomorrow. I pulled up the latest code and it is failing for me with Cuda OOM (different than CI failure). I also ran the test again on 208e662, it works. So likely something changed on master that is causing test to fail. AFAICT, the failure is not related to LoRA, so training ultravox lora shouldn't have any impact. Let me troubleshoot this tomorrow and figure out what is going on before you spend time on this. |
I spent more time looking at the failure. The failure happens during init when running ultravox with dummy data and I haven't been able to reproduce it locally (though locally test fails with cuda oom error for llama on latest version). I'll look further tomorrow why CI is seeing device mismatch when running whisper. I checked the other ultravox test, that is working fine. I'll look further into it tomorrow |
We can first remove the LoRA test-related code and merge this PR. I'll spend some time later training a LoRA model. What do you think? |
Head branch was pushed to by a user without write access
d247036
to
1195ad8
Compare
d5d023b
to
09c8388
Compare
@jeejeelee can you trigger the lora test and add me in Buildkite org (if you have perms) so that i can trigger lora tests next time? I made few tweaks and want to verify if it works |
I've already started it. I suggest that if it still doesn't pass, we should remove the lora test. There's no need to waste time here. |
""" | ||
#Check if set_default_device fixes the CI failure. Other lora tests set | ||
# device "to cuda which might be causing device mismatch in CI | ||
torch.set_default_device("cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too hacky, don't do it. This significantly increased the CI testing time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the time before and after: delta was ~4 mins. Knowing this works, let me check if I can find a better fix for this
Source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is likely a bug in transformer's whisper. I've opened a PR to fix that.
Workaround isn't ideal but impact is limited IMO given the passing test takes roughly 2-3 mins anyways. Moreover, default device is used only when device is not explicitly passed in functions param which is why we aren't seeing much impact
I also tried few more options like passing device all way through from vllm -> ultravox -> whisper but it is lot more complicated and require more changes in bunch of places. So I think it is okay to merge for now and I can clean up when it is fixed in transformer. Sounds okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee okay to merge if tests are passing?
Btw, I verified that device bug exists upstream and I'm working on getting that patched up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jeejeelee is set_default_device a blocker? Let me know how do you want to move forward
7f010af
to
2b8db17
Compare
- Reduce model len and max num seq to reduce memory - Re-trigger tests Signed-off-by: Sumit Vij <sumitvij11+github@gmail.com>
2b8db17
to
1976ee0
Compare
Signed-off-by: Felix Marty <felmarty@amd.com>
This should also work w/ mistral models which also uses LlamaForCasualLM architecture re: here