Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GPTQ] Iterative Parameter Updating (#863)
* Implement iterative parameter updating Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * WIP, observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use minmax observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use user-specified observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
- Loading branch information