Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix validation errors for smoothquant modifier + update examples #19

Merged
merged 3 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions examples/quantization_w8a8_int8/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,14 @@ We first select the quantization algorithm. For W8A8, we want to:

```python
from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Configure the quantization algorithm to run. This more complex scheme requires a YAML based recipe.
recipe = "./recipe.yaml"
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier

# Configure the quantization algorithms to run.
recipe = [
SmoothQuantModifier(smoothing_strength=0.8),
GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
]

# Apply quantization.
oneshot(
Expand Down
8 changes: 6 additions & 2 deletions examples/quantization_w8a8_int8/llama3_example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from datasets import load_dataset
from transformers import AutoTokenizer

from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot

# Select model and load it.
Expand Down Expand Up @@ -55,9 +57,11 @@ def tokenize(sample):
# * apply SmoothQuant to make the activations easier to quantize
# * quantize the weights to int8 with GPTQ (static per channel)
# * quantize the activations to int8 (dynamic per token)
# Note: this scheme currently requires a more complex yaml recipe
# Note: set sequential_update: true in the recipe to reduce memory
recipe = "./recipe.yaml"
recipe = [
SmoothQuantModifier(smoothing_strength=0.8),
GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
]

# Apply algorithms.
oneshot(
Expand Down
26 changes: 0 additions & 26 deletions examples/quantization_w8a8_int8/recipe.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion src/llmcompressor/modifiers/smoothquant/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ class SmoothQuantModifier(Modifier):
num_calibration_steps: Optional[int] = None
calibration_function: Optional[Callable] = None

hooks_: List = None
hooks_: Optional[List] = None
resolved_mappings_: Optional[List] = None
scales_: Optional[Dict] = None

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def save_pretrained_compressed(save_pretrained_method):
def save_pretrained_wrapper(
save_directory: str,
sparsity_config: Optional[SparsityCompressionConfig] = None,
quantization_format: str = None,
quantization_format: Optional[str] = None,
save_compressed: bool = False,
skip_compression_stats: bool = False,
**kwargs,
Expand Down
Loading