-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Bump Flashinfer to 0.3.1 #24868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump Flashinfer to 0.3.1 #24868
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request bumps the FlashInfer version to 0.3.1 in the Dockerfile. However, the corresponding dependency in setup.py has not been updated, which will lead to inconsistent versions being installed depending on the environment. This is a critical issue that needs to be addressed.
docker/Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the preceding line indicates that this version should be synchronized with the flashinfer extra in setup.py. While this file is updated to v0.3.1, setup.py still specifies flashinfer-python==0.3.0. This discrepancy will cause different versions of FlashInfer to be installed depending on whether the project is built using Docker or installed via pip, which can lead to hard-to-debug issues. Please update setup.py to use version 0.3.1 as well.
|
No ciflow labels are configured for this repo. |
|
No ciflow labels are configured for this repo. |
Signed-off-by: bbartels <benjamin@bartels.dev>
|
No ciflow labels are configured for this repo. |
|
@bbartels Could you please add to the description the reason for the update? |
|
Done @mgoin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passed the blackwell test, LGTM!
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: bbartels <benjamin@bartels.dev> [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561) Signed-off-by: Andrew Xia <axia@meta.com> [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (vllm-project#24759) Signed-off-by: Andrew Xia <axia@meta.com> [Performance] Remove redundant clone() calls in cutlass_mla (vllm-project#24891) [Bug] Fix Cutlass Scaled MM Compilation Error (vllm-project#24887) Signed-off-by: yewentao256 <zhyanwentao@126.com> [ci] fix wheel names for arm wheels (vllm-project#24898) Signed-off-by: simon-mo <simon.mo@hey.com> [Tests] fix initialization of kv hash in tests (vllm-project#24273) Signed-off-by: Mickael Seznec <mickael@mistral.ai> [Compile] Fix noop_elimination pass and add tests for noop_elimination (vllm-project#24880) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Propagate entire tokens to connector for resumed preemptions Signed-off-by: Qier Li <kevin44036@gmail.com> Fix pre-commit Signed-off-by: Qier Li <kevin44036@gmail.com> Rename field and nullify empty lists Signed-off-by: Qier Li <kevin44036@gmail.com> Update vllm/v1/core/sched/scheduler.py Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Qier Li <kevin44036@gmail.com> Add unit test for preemption resumption Signed-off-by: Qier Li <kevin44036@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Bumps flashinfer to v0.3.1. Thes version comes with some fixes around certain code paths not being AOT'd.
Test Plan
Run CI checks
Test Result
CI checks passed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.