Skip to content

Commit cc35151

Browse files
authored
Make SmoothQuant more General (#2728)
* Make SmoothQuant more General Summary: - Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow Test Plan: - Qwen 3-8B with example.py and unittest - Additional test plans requirerd ETC - Fix typo in README.md for SmoothQuant * refactor: use predefined ToyLinearModel * fix incorrect parameters * add type hint for dataclass * use Quantization API for more generalized SmoothQuant API * add PREPARE_FOR_LOADING mode for loading quantized weight * update example and doc for updated SmoothQuant API * remove overused/misunderstood parameters * remove unused variable from SmoothQuant * update SmoothQuant docs for user guide * add benchmark comparison: base vs smoothquant * add benchmark: w4a8-dynamic * update docs for a4w8 benchmark * replace Sec/Tokens with Tokens/Sec for metrics * update docs for SmoothQuant experiment * fix typo in README * rename parser: repo to model * fix incorrect id: w4a8 -> w8a8 * remove args: precision dtype, `torch.compile` * rename: precision -> precision dtype in benchmark table * add args: bias * fix typo: W4A8 -> W8A8 * fix ci after adding is_bias args * remove dead annotations in args: smoothing_factor * remove torch.compile from unittests * refactor: use ToyLinearModel in AWQ * remove unused test case: dtype, alpha * refactor: parametrize `base_config` * add TODO for future update * update integration test for new SmoothQuant API * add unittest: sanity check for smoothquant acc * bugfix: ImportError for `ToyLinearModel` * revert: smoothquant unit test name * revert: integration test * update docs * update docs * add skiptest: no cuda case
1 parent 186aeb0 commit cc35151

File tree

6 files changed

+574
-750
lines changed

6 files changed

+574
-750
lines changed

0 commit comments

Comments
 (0)