-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[torchao] Add support for ModuleFqnToConfig using regex #26001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchao] Add support for ModuleFqnToConfig using regex #26001
Conversation
0031c5e to
4b2564b
Compare
| # we'll apply the first matched pattern | ||
| c = module_fqn_to_config[maybe_module_fqn_pattern] | ||
| break | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some indent error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intended actually, the else branch is executed when the loop finishes (and didn't break out of the loop through break) so we have a default config
68ce5ac to
f041a99
Compare
29210b3 to
608696a
Compare
|
@houseroad this is ready to review btw, we have landed the corresponding PR in torchao: pytorch/ao#3084 |
|
This pull request has merge conflicts that must be resolved before it can be |
608696a to
21383c7
Compare
…ng regex Summary: att, we are adding regex support to simplify the config, and enabling the support in both transformers and vllm to make sure regex config works everywhere torchao PR that adds the functionality to quantize_ API: pytorch/ao#3084 transformer PR: Test Plan: We save the model with the regex config in transformers, in vllm we just make sure we can load the model: pytest tests/quantization/test_torchao.py test_opt_125m_module_fqn_to_config_regex_model_loading_with_params Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
21383c7 to
7c19b0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...
…#26001) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…#26001) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…#26001) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
…#26001) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Summary:
att, we are adding regex support to simplify the config, and enabling the support in both transformers and vllm to make sure regex config works everywhere
torchao PR that adds the functionality to quantize_ API: pytorch/ao#3084
transformer PR: huggingface/transformers#41242
Test Plan:
We save the model with the regex config in transformers, in vllm we just make sure we can load the model:
model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev
Output:
Reviewers:
Subscribers:
Tasks:
Tags: