Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HFTrainer overloads #869

Merged
merged 5 commits into from
Oct 28, 2024
Merged

Conversation

kylesayrs
Copy link
Collaborator

Purpose

  • Fix failing tests in tests/llmcompressor/transformers/finetune/test_finetune_no_recipe_custom_dataset.py

Background

  • The HFTrainer function contracts were changed in the most recent transformers release 4.46.0 (offending commit)

Changes

  • Change mixin overloads to use the same function contract as the original functions
  • This change is backwards compatible with earlier versions of transformers since the new arguments are kwargs

Testing

  • tests/llmcompressor/transformers/finetune/test_finetune_no_recipe_custom_dataset.py now passes

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs force-pushed the kylesayrs/fix-trainer-function-contracts branch from 9d51505 to 9a7711e Compare October 27, 2024 17:19
mgoin
mgoin previously approved these changes Oct 27, 2024
src/llmcompressor/transformers/finetune/session_mixin.py Outdated Show resolved Hide resolved
rahul-tuli
rahul-tuli previously approved these changes Oct 27, 2024
@kylesayrs kylesayrs dismissed stale reviews from rahul-tuli and mgoin via 2db27c5 October 28, 2024 04:47
dsikka
dsikka previously approved these changes Oct 28, 2024
rahul-tuli
rahul-tuli previously approved these changes Oct 28, 2024
@rahul-tuli
Copy link
Collaborator

Should we add *args, **kwargs and pass them to the super functions, to protect against future changes like this?

mgoin
mgoin previously approved these changes Oct 28, 2024
src/llmcompressor/transformers/finetune/session_mixin.py Outdated Show resolved Hide resolved
@kylesayrs
Copy link
Collaborator Author

@rahul-tuli Since both functions require some information from the inputs, I think I'd prefer to expose all the arguments. There are tradeoffs either way w.r.t. erroring when arguments change

@kylesayrs kylesayrs dismissed stale reviews from mgoin, rahul-tuli, and dsikka via b6b54fb October 28, 2024 17:14
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/fix-trainer-function-contracts branch from b6b54fb to 84be172 Compare October 28, 2024 17:15
@dsikka dsikka merged commit c0c23b1 into main Oct 28, 2024
6 of 7 checks passed
@dsikka dsikka deleted the kylesayrs/fix-trainer-function-contracts branch October 28, 2024 19:00
kylesayrs added a commit that referenced this pull request Nov 7, 2024
* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
dsikka added a commit that referenced this pull request Nov 7, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
kylesayrs added a commit that referenced this pull request Nov 19, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
mgoin added a commit that referenced this pull request Nov 19, 2024
* set targets default earlier, remove QuantizationScheme.default_scheme

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* clearer warning

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix typo

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update docstring, use default factory for mutable default

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use Linear default

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <dipikasikka1@gmail.com>

* quality

---------

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* bump version (#907)

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* correct typo (#888)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use default factory, since default does not trigger field validator

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: George <george@neuralmagic.com>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* set targets default earlier, remove QuantizationScheme.default_scheme

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* clearer warning

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix typo

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update docstring, use default factory for mutable default

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use Linear default

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <dipikasikka1@gmail.com>

* quality

---------

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* bump version (#907)

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* correct typo (#888)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use default factory, since default does not trigger field validator

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: George <george@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
dsikka added a commit that referenced this pull request Nov 25, 2024
* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* no cache context

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* support mllamaconfig

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix typo

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add docstring

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* make docstring runnable

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* 2of4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* rename test file

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* names

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* named args all around

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* in place function

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* tickle

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* let's give it a try

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* whitespace

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* delete unneeded workflow

Signed-off-by: andy-neuma <andy@neuralmagic.com>

* adjust trigger

Signed-off-by: andy-neuma <andy@neuralmagic.com>

---------

Signed-off-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP, observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use minmax observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>

---------

Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use user-specified observer

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <dipikasikka1@gmail.com>

* quality

---------

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* bump version (#907)

Signed-off-by: Dipika <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* correct typo (#888)

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* print config for better debugging

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Kyle Sayers <kyle@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Jincheng Miao <jincheng.miao@intel.com>
Co-authored-by: 黄石 <yzlnew@gmail.com>
Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>
Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: George <george@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants