-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vLLM e2e tests #117
Add vLLM e2e tests #117
Conversation
# Run vLLM with saved model | ||
print("================= RUNNING vLLM =========================") | ||
sampling_params = SamplingParams(temperature=0.80, top_p=0.95) | ||
llm = LLM(model=self.save_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a test for tp>1 is also a good idea if we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah I think that'll be a follow-up test since the structure will change a bit to deal with tp>1 with the same process
I do think that's more of a vLLM test. If anything, we could extend this to publish test models which are then pulled down for all vllm tests.
tests/e2e/vLLM/test_vllm.py
Outdated
llm = LLM(model=self.save_dir) | ||
outputs = llm.generate(self.prompts, sampling_params) | ||
print("================= vLLM GENERATION ======================") | ||
print(outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic for perplexity tests can be borrowed from
https://github.com/vllm-project/llm-compressor/blob/main/tests/llmcompressor/transformers/compression/test_quantization.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest running gsm8k on 200 samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and covers all the main cases I can think of! Just had one note on validating output
* add first test * update tests * update to use config files * update test * update to add int8 tests * update * fix condition * fix typo * add w8a16 * update * update to clear session and delete dirs * conditional import for vllm * update * update num samples * add more test cases; add custom recipe support * update model * updat recipe modifier * Update fp8_weight_only.yaml * add more test cases * try a larger model * revert * add description; save model to hub post testing
Summary
Testing
llm-compressor-testing
: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/10568408144/workflow