-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VLM] Post-layernorm override and quant config in vision encoder #9217
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
cc @litianjian |
OK, I will update the llava models after this PR. |
@ywang96 PTAL when you have time. |
@DarkLight1337 Would you mind sharing the current progress? |
Sorry for the delay, I have been focused on other stuff lately. |
576c0f2
to
ef4c253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for enabling this!
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: charlifu <charlifu@amd.com>
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: Erkin Sagiroglu <erkin@infra-aipipeline-1-at1-prox-prod-a.ipa.corp.telnyx.com>
MllamaVisionEncoderLayer(config, is_gated) | ||
for _ in range(num_layers) | ||
MllamaVisionEncoderLayer(config, | ||
quant_config=quant_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks partially quantized models such as https://huggingface.co/amd/Llama-3.2-11B-Vision-Instruct-FP8-KV
Since the vision part is not quantized, now because of this config an on the fly quantization is being applied (https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/quantization/utils/w8a8_utils.py#L126), which can't work with vision MLP due to it being 3-dimentional. So it eventually crashes at https://github.com/ROCm/vllm/blob/main/vllm/_custom_ops.py#L711
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgoin can you help with this so we can handle both partially and fully quantized models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed a fix in #9800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 I am working on this for as many models as possible in #9772
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: qishuai <ferdinandzhong@gmail.com>
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: NickLucche <nlucches@redhat.com>
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <2037008807@qq.com> Signed-off-by: NickLucche <nlucches@redhat.com>
Also,
QuantizationConfig
and associatedprefix
argument is now passed to vision towers to maintain consistency. Nevertheless, since vision tower is not quantized by existing methods yet, we ignore it and add a code comment explaining this.FIX #9186