[Observers] group size + channel wise + per token #32

horheynm · 2024-04-19T20:17:11Z

starter script

from copy import deepcopy

import torch
from compressed_tensors.quantization.lifecycle.calibration import set_module_for_calibration
from compressed_tensors.quantization.lifecycle.frozen import freeze_module_quantization
from compressed_tensors.quantization.lifecycle.initialize import (
    initialize_module_for_quantization,
)
from compressed_tensors.quantization.quant_config import QuantizationStatus
from compressed_tensors.quantization.quant_args import QuantizationArgs
from torch.nn import Linear

from compressed_tensors.quantization.quant_scheme import QuantizationScheme

num_bits = 8

quantization_scheme = QuantizationScheme(
    # input_activations=QuantizationArgs(num_bits=num_bits, symmetric=False, group_size = 4), # ADJUST GROUPSIZE HERE FOR CHANNEL WISE, GROUP WISE
    input_activations=QuantizationArgs(num_bits=num_bits, symmetric=False, group_size = -1),
    
    weights=QuantizationArgs(num_bits=num_bits, symmetric=True),
    targets=["*"],
)

layer = Linear(4, 4)
layer.weight.data *= 100

# over write forward pass and register zero_point and scale
initialize_module_for_quantization(layer, quantization_scheme)

set_module_for_calibration(layer)

layer(torch.randn(2, 4, 4))

initalized_layer = deepcopy(layer)

# calibrate the layers with each iteration
for _ in range(10):
    layer(torch.randn(4, 4))

# Freeze, no update after any forward pass
freeze_module_quantization(layer)
for _ in range(10):
    layer(torch.randn(4, 4))

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/observers/base.py

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/observers/base.py

Satrat

The main thing I see missing here is that we aren't actually using the strategy field of QuantizationArgs. It makes sense to support group_size=-1 as channelwise but I think the code would be more readable if instead of checking for group size we could just check for the QuantizationArgs.strategy enum. This would make it easier to extend when we add the token strategy too.

Maybe we could add a validator to QuantizationArgs so if the user specifies group_size=-1 we automatically set channel as the strategy

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/observers/base.py

…s into group-size

src/compressed_tensors/quantization/quant_args.py

… group-size

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/quant_args.py

src/compressed_tensors/quantization/observers/base.py

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

…s into group-size

bfineran

let's also add at least a simple test for each strategy that validates a forward pass runs and scales/zero points have the expected shape

src/compressed_tensors/quantization/lifecycle/forward.py

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

Satrat

LGTM once the test failures are fixed!

horheynm added 3 commits April 19, 2024 20:16

group size

f2189c7

add logic in base observer

81954b6

group size full lifecycle run

803f495

bfineran reviewed Apr 23, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved

horheynm added 2 commits April 24, 2024 15:53

before vectorize the for loop

cda1c48

comments, todo add channelwise

3cc730d

bfineran reviewed Apr 24, 2024

View reviewed changes

horheynm added 2 commits April 24, 2024 19:00

chan wise impl

bd67232

comments

5bf66ad

bfineran reviewed Apr 25, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/forward.py Show resolved Hide resolved

src/compressed_tensors/quantization/observers/base.py Outdated Show resolved Hide resolved

fix channel wise

666adea

bfineran previously approved these changes Apr 25, 2024

View reviewed changes

Merge branch 'main' into group-size

c19a599

horheynm dismissed bfineran’s stale review via c19a599 April 25, 2024 18:09

horheynm changed the title ~~group size~~ [Observers] group size + channel wise quantization Apr 25, 2024

Satrat suggested changes Apr 25, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved

src/compressed_tensors/quantization/observers/base.py Outdated Show resolved Hide resolved

horheynm added 3 commits April 25, 2024 18:48

comments, validators

407ab02

Merge branch 'group-size' of github.com:neuralmagic/compressed-tensor…

8547d50

…s into group-size

fix typo

309ebe2

bfineran reviewed Apr 29, 2024

View reviewed changes

src/compressed_tensors/quantization/quant_args.py Show resolved Hide resolved

horheynm added 5 commits April 29, 2024 15:16

Merge branch 'main' of github.com:neuralmagic/compressed-tensors into…

8a2224f

… group-size

tensor return error fix

d3f0803

fix sparseml-side of code and add per channel

182195f

pyndatic defaults

f35e4c9

token wise quant

f26d7f8

horheynm changed the title ~~[Observers] group size + channel wise quantization~~ [Observers] group size + channel wise + per token Apr 29, 2024

bfineran reviewed Apr 29, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved

src/compressed_tensors/quantization/quant_args.py Outdated Show resolved Hide resolved

bfineran reviewed Apr 29, 2024

View reviewed changes

src/compressed_tensors/quantization/observers/base.py Outdated Show resolved Hide resolved

horheynm and others added 2 commits April 29, 2024 15:04

Update src/compressed_tensors/quantization/quant_args.py

98a0f8b

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

comments'

176713a

horheynm added 3 commits April 29, 2024 19:11

Merge branch 'group-size' of github.com:neuralmagic/compressed-tensor…

e889c5a

…s into group-size

update dim

5067146

shape consistency

0fd1c8d

bfineran reviewed May 2, 2024

View reviewed changes

horheynm and others added 2 commits May 2, 2024 12:31

Update src/compressed_tensors/quantization/lifecycle/forward.py

e62de87

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

comments

e929df2

bfineran previously approved these changes May 2, 2024

View reviewed changes

Satrat reviewed May 3, 2024

View reviewed changes

pass test_quant_args

1229c5a

horheynm dismissed bfineran’s stale review via 1229c5a May 3, 2024 15:36

dsikka approved these changes May 3, 2024

View reviewed changes

Satrat approved these changes May 3, 2024

View reviewed changes

Satrat merged commit 05c1487 into main May 3, 2024
2 checks passed

Satrat deleted the group-size branch May 3, 2024 17:59

rahul-tuli restored the group-size branch May 6, 2024 15:17

bfineran deleted the group-size branch May 8, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observers] group size + channel wise + per token #32

[Observers] group size + channel wise + per token #32

horheynm commented Apr 19, 2024 •

edited

Loading

Satrat left a comment

bfineran left a comment

Satrat left a comment

[Observers] group size + channel wise + per token #32

[Observers] group size + channel wise + per token #32

Conversation

horheynm commented Apr 19, 2024 • edited Loading

Satrat left a comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

Satrat left a comment

Choose a reason for hiding this comment

horheynm commented Apr 19, 2024 •

edited

Loading