Skip to content

Commit

Permalink
Enable specyfing alpha for SQ (#9423)
Browse files Browse the repository at this point in the history
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
  • Loading branch information
janekl authored Jun 11, 2024
1 parent ebba8b1 commit 97aa732
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ quantization:
calib_dataset: cnn_dailymail # wikitext, cnn_dailymail, or a local dataset
num_calib_size: 512 # number of samples used for calibration
awq_block_size: 128 # block size for scaling factors in AWQ algorithm
alpha: 1.0 # alpha parameter in SmoothQuant algorithm

export:
decoder_type: llama # gptnext, gpt2, llama
Expand Down
3 changes: 3 additions & 0 deletions nemo/export/quantize/quantizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ def __init__(
"axis": None,
"enable": enable_quant_kv_cache,
}
if quantization_config.algorithm == "int8_sq":
logging.info(f"Using int8_sq alpha = {quantization_config.alpha}")
quant_cfg["algorithm"] = {"method": "smoothquant", "alpha": quantization_config.alpha}

self.quant_cfg = quant_cfg
else:
Expand Down

0 comments on commit 97aa732

Please sign in to comment.