Skip to content

Commit 4cfc0e6

Browse files
toncaocpatonnbrian-dellabetta
authored
ignore _update_mamba_mask for AWQ sequential tracing (#1925)
SUMMARY: In models with mamba-2 layers e.g., [nvidia/NVIDIA-Nemotron-Nano-12B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2), [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct), tracing _update_mamba_masks would lead to ``` File "NemotronHModel_8045287568680_autowrapped", line 57, in forward File "/mnt/LinuxDrive/huggingface/modules/transformers_modules/NVIDIA_hyphen_Nemotron_hyphen_Nano_hyphen_12B_hyphen_v2/modeling_nemotron_h.py", line 1461, in _update_mamba_mask if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)): ^^^^^^^^^^^^^^^^^^^^^ File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/transformers/utils/fx.py", line 674, in __bool__ return super().__bool__() ^^^^^^^^^^^^^^^^^^ File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/torch/fx/proxy.py", line 577, in __bool__ return self.tracer.to_bool(self) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/toncao/anaconda3/envs/llm-compressor_v1/lib/python3.12/site-packages/torch/fx/proxy.py", line 388, in to_bool raise TraceError( torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow ``` from the function: ``` def _update_mamba_mask(self, attention_mask, cache_position): -- """ No need for zeroing states when 1. Cached forward 2. Attending to all inputs """ mamba_mask = attention_mask if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)): mamba_mask = None return mamba_mask ``` And thus, adding _update_mamba_masks to the ignore tracing list makes AWQ sequential tracing works. TEST PLAN: local make test results: ``` ===================================================== short test summary info ===================================================== FAILED tests/llmcompressor/modeling/test_calib_deepseek_v3.py::test_calib_deepseekv3_module - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 56.00 MiB. GPU 0 has a total capacity of 23.57 GiB of which 14.1... FAILED tests/llmcompressor/utils/test_helpers.py::test_disable_cache[MllamaForConditionalGeneration-meta-llama/Llama-3.2-11B-Vision-Instruct] - huggingface_hub.errors.GatedRepoError: 403 Client Error. (Request ID: Root=1-68ee275c-378c35b1649b823602164fc0;24ebe331-9031-4... FAILED tests/lmeval/test_lmeval.py::TestLMEval::test_lm_eval[None] - TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'NoneType' ====================================== 3 failed, 242 passed, 4 skipped in 129.47s (0:02:09) ======================================= ``` Co-authored-by: toncao <cpatonn@gmail.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
1 parent d7d1b45 commit 4cfc0e6

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

src/llmcompressor/args/dataset_arguments.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ class DatasetArguments(CustomDatasetArguments):
194194
default_factory=lambda: [
195195
"_update_causal_mask",
196196
"create_causal_mask",
197+
"_update_mamba_mask",
197198
"make_causal_mask",
198199
"get_causal_mask",
199200
"mask_interface",

0 commit comments

Comments
 (0)