[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization #187

dsikka · 2024-10-08T14:59:32Z

Summary

As part of the observer restructure, this is the first of the PRs to move observers out of compressed-tensors
Removes the MemoryLessObserver and replaces it with a simple helper function which can determine the zp/scales dynamically during the forward pass
Will now return None for observers when running dynamic quantization and update the config such that null shows up in the config

Testing

Tested llm-compressor examples for int8 and fp8, w8a8 for both; will no longer attach observers and the config will look like the example below:

"quantization_config": {
    "config_groups": {
      "group_0": {
        "input_activations": {
          "actorder": null,
          "block_structure": null,
          "dynamic": true,
          "group_size": null,
          "num_bits": 8,
          "observer": null,
          "observer_kwargs": {},
          "strategy": "token",
          "symmetric": true,
          "type": "float"
        },
        "output_activations": null,
        "targets": [
          "Linear"
        ],
        "weights": {
          "actorder": null,
          "block_structure": null,
          "dynamic": false,
          "group_size": null,
          "num_bits": 8,
          "observer": "minmax",
          "observer_kwargs": {},
          "strategy": "channel",
          "symmetric": true,
          "type": "float"
        }
      }
    },

Tested compatibility with existing models (can still be loaded and run as expected)

mgoin

Need to update observer: str = Field( to observer: Optional[str] = Field( in src/compressed_tensors/quantization/quant_args.py

src/compressed_tensors/quantization/quant_args.py

src/compressed_tensors/quantization/observers/helpers.py

kylesayrs

lgtm, would like to see more validation/error catching happen earlier in the config

src/compressed_tensors/quantization/quant_args.py

rahul-tuli

remove memoryless observer; use helper function for dynamic quantization

9ebfbaf

dsikka changed the title ~~Observer Restructure: Reemove MemoryLess Observer; use helper function for dynamic quantization~~ Observer Restructure: Remove MemoryLess Observer; use helper function for dynamic quantization Oct 8, 2024

update init

e6ea41b

dsikka changed the title ~~Observer Restructure: Remove MemoryLess Observer; use helper function for dynamic quantization~~ [Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization Oct 8, 2024

dsikka added 2 commits October 8, 2024 19:19

clean-up

6f4cae7

update test case

52e9a45

mgoin reviewed Oct 8, 2024

View reviewed changes

dsikka marked this pull request as ready for review October 9, 2024 15:24

fix arg

0e52e66

rahul-tuli previously approved these changes Oct 10, 2024

View reviewed changes

dsikka requested a review from mgoin October 10, 2024 20:41

kylesayrs reviewed Oct 10, 2024

View reviewed changes

src/compressed_tensors/quantization/quant_args.py Show resolved Hide resolved

kylesayrs reviewed Oct 10, 2024

View reviewed changes

src/compressed_tensors/quantization/observers/helpers.py Show resolved Hide resolved

kylesayrs reviewed Oct 10, 2024

View reviewed changes

src/compressed_tensors/quantization/observers/helpers.py Outdated Show resolved Hide resolved

kylesayrs requested changes Oct 10, 2024

View reviewed changes

validation + update name

2d5f667

dsikka dismissed rahul-tuli’s stale review via 2d5f667 October 11, 2024 01:14

dsikka requested review from kylesayrs and rahul-tuli October 11, 2024 01:15

kylesayrs requested changes Oct 11, 2024

View reviewed changes

src/compressed_tensors/quantization/quant_args.py Show resolved Hide resolved

update preset schemes; swap condition check

1d488e7

dsikka requested a review from kylesayrs October 11, 2024 13:39

kylesayrs approved these changes Oct 11, 2024

View reviewed changes

rahul-tuli approved these changes Oct 11, 2024

View reviewed changes

dsikka merged commit b2abe72 into main Oct 11, 2024
1 check passed

dsikka deleted the update-observers branch October 11, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization #187

[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization #187

dsikka commented Oct 8, 2024 •

edited

Loading

mgoin left a comment

kylesayrs left a comment

rahul-tuli left a comment

[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization #187

[Observer Restructure]: Remove MemoryLess Observer; use helper function for dynamic quantization #187

Conversation

dsikka commented Oct 8, 2024 • edited Loading

mgoin left a comment

Choose a reason for hiding this comment

kylesayrs left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

dsikka commented Oct 8, 2024 •

edited

Loading