Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lora requests when dp with vllm #2433

Merged

Conversation

ckgresla
Copy link
Contributor

Following some investigation, I believe there is a fix to be made inside the VLLM class's _model_generate() method. I stumbled across an inconsistency when trying to evaluate some LoRA adapters with vLLM as the backend in lm_eval, see this issue as a reference.

Specifically, there the evaluation results were exactly the same (greedy decoding) for a trained LoRA adapter and its base model. When we merged the adapter into the model, we got quite different results. After snooping a bit around I believe the issue has to do with how requests are sent to the vLLM model at test time, when we set data_parallel_size>1 and lora_request the prior behavior would have issued regular base model requests, without using the adapter.

This PR provides a solution for that issue, and routes requests through the LoRA adapter even when using data parallelism.

@CLAassistant
Copy link

CLAassistant commented Oct 28, 2024

CLA assistant check
All committers have signed the CLA.

@baberabb
Copy link
Contributor

Thanks v. much for the PR. LGTM, if you could the pre-commit to pass the formatting CI tests:

pip install pre-commit
pre-commit install
pre-commit run --all-files

@ckgresla
Copy link
Contributor Author

Thank you for the helpful commands! Linted and ready for CI. @baberabb

@baberabb baberabb merged commit 838a3e0 into EleutherAI:main Oct 30, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants