-
Notifications
You must be signed in to change notification settings - Fork 366
Commit cc35151
authored
Make SmoothQuant more General (#2728)
* Make SmoothQuant more General
Summary:
- Added SmoothQuantConfig as a base config and made corresponding changes in other parts of the flow
Test Plan:
- Qwen 3-8B with example.py and unittest
- Additional test plans requirerd
ETC
- Fix typo in README.md for SmoothQuant
* refactor: use predefined ToyLinearModel
* fix incorrect parameters
* add type hint for dataclass
* use Quantization API for more generalized SmoothQuant API
* add PREPARE_FOR_LOADING mode for loading quantized weight
* update example and doc for updated SmoothQuant API
* remove overused/misunderstood parameters
* remove unused variable from SmoothQuant
* update SmoothQuant docs for user guide
* add benchmark comparison: base vs smoothquant
* add benchmark: w4a8-dynamic
* update docs for a4w8 benchmark
* replace Sec/Tokens with Tokens/Sec for metrics
* update docs for SmoothQuant experiment
* fix typo in README
* rename parser: repo to model
* fix incorrect id: w4a8 -> w8a8
* remove args: precision dtype, `torch.compile`
* rename: precision -> precision dtype in benchmark table
* add args: bias
* fix typo: W4A8 -> W8A8
* fix ci after adding is_bias args
* remove dead annotations in args: smoothing_factor
* remove torch.compile from unittests
* refactor: use ToyLinearModel in AWQ
* remove unused test case: dtype, alpha
* refactor: parametrize `base_config`
* add TODO for future update
* update integration test for new SmoothQuant API
* add unittest: sanity check for smoothquant acc
* bugfix: ImportError for `ToyLinearModel`
* revert: smoothquant unit test name
* revert: integration test
* update docs
* update docs
* add skiptest: no cuda case1 parent 186aeb0 commit cc35151Copy full SHA for cc35151
File tree
Expand file treeCollapse file tree
6 files changed
+574
-750
lines changedOpen diff view settings
Filter options
- test/prototype
- torchao/prototype/smoothquant
Expand file treeCollapse file tree
6 files changed
+574
-750
lines changedOpen diff view settings
0 commit comments