Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmoothQuant Mappings Only Work When Defined in a Recipe String #37

Closed
Satrat opened this issue Jul 24, 2024 · 2 comments
Closed

SmoothQuant Mappings Only Work When Defined in a Recipe String #37

Satrat opened this issue Jul 24, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Satrat
Copy link
Contributor

Satrat commented Jul 24, 2024

Describe the bug
When a oneshot recipe is defined programmatically, the SmoothQuant mappings are not correctly parsed. This bug does not occur when the recipe is specified as a YAML string or file

Expected behavior
SmoothQuantModifier is correctly initialized, and oneshot runs to completion.

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 18.04]: Ubuntu
  2. Python version [e.g. 3.7]: 3.10.12
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: main, 07c1fd7
  4. ML framework version(s) [e.g. torch 1.7.1]: torch 2.3.1, transformers 4.42.4
  5. Other Python package versions [e.g. SparseZoo, DeepSparse, numpy, ONNX]: n/a
  6. Other relevant environment information [e.g. hardware, CUDA version]: CUDA 12.3

To Reproduce
Example script:

from llmcompressor.modifiers.smoothquant.base import DEFAULT_SMOOTHQUANT_MAPPINGS
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.modifiers.quantization.gptq import GPTQModifier

from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8, mappings=DEFAULT_SMOOTHQUANT_MAPPINGS),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"], sequential_update=False),
]

model_stub = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
model = SparseAutoModelForCausalLM.from_pretrained(model_stub, device_map="auto", torch_dtype="auto")

dataset = "ultrachat-200k"
output_dir = "./test_output"
splits = {"calibration": "train_gen[:5%]"}
max_seq_length = 2048
pad_to_max_length = False
num_calibration_samples = 8

oneshot(
    model=model,
    dataset=dataset,
    recipe=recipe,
    output_dir=output_dir,
    splits=splits,
    max_seq_length=max_seq_length,
    pad_to_max_length=pad_to_max_length,
    num_calibration_samples=num_calibration_samples,
    save_compressed=True
)

Errors
oneshot fails with the following error:

Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      index: null
      group: null
      start: -1
      end: -1
      update: null
      initialized_structure_: false
      initialized_: false
      finalized_: false
      started_: false
      ended_: false
      smoothing_strength: 0.8
      mappings:
      - !!python/tuple
        - - re:.*q_proj
          - re:.*k_proj
          - re:.*v_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_proj
          - re:.*up_proj
        - re:.*post_attention_layernorm

Additional context
Updating the recipe to the equivalent yaml string fixes the issue:

recipe = """
DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.8
      mappings:
      - - ['re:.*q_proj', 're:.*k_proj', 're:.*v_proj']
        - re:.*input_layernorm
      - - ['re:.*gate_proj', 're:.*up_proj']
        - re:.*post_attention_layernorm
    GPTQModifier:
      sequential_update: false
      targets: Linear
      scheme: W8A8
"""
@rahul-tuli
Copy link
Collaborator

Resolved by PR #48

@HelloCard
Copy link

With the latest 0.3.1 version, this annoying problem still exists:

recipe = [
    SmoothQuantModifier(smoothing_strength=0.85,
    mappings=[
      [["re:.*qkv_proj"], "re:.*input_layernorm"],
      [["re:.*gate_up_proj"], "re:.*post_attention_layernorm"],
    ]),
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"], sequential_update=True),
]

fail:

raise ValueError(f"Could not parse recipe from string {content}") from err
ValueError: Could not parse recipe from string DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.85
      mappings:
      - !!python/tuple
        - - re:.*qkv_proj
        - re:.*input_layernorm
      - !!python/tuple
        - - re:.*gate_up_proj
        - re:.*post_attention_layernorm
    GPTQModifier:
      sequential_update: true
      targets: Linear
      ignore:
      - lm_head
      scheme: W8A8
recipe = """
DEFAULT_stage:
  DEFAULT_modifiers:
    SmoothQuantModifier:
      smoothing_strength: 0.85
      mappings:
      - - ['re:.*qkv_proj']
        - re:.*input_layernorm
      - - ['re:.*gate_up_proj']
        - re:.*post_attention_layernorm
    GPTQModifier:
      sequential_update: false
      targets: Linear
      scheme: W8A8
"""

succeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants