-
Notifications
You must be signed in to change notification settings - Fork 385
Open
Labels
Description
In _choose_qparams_affine (for int):
we use keepdim=False:
ao/torchao/quantization/quant_primitives.py
Lines 1552 to 1553 in b4ec4cb
| min_val = torch.amin(input, dim=reduction_dims, keepdim=False) | |
| max_val = torch.amax(input, dim=reduction_dims, keepdim=False) |
I think we can change _choose_qparams_affine to use keepdim=True and make sure we match the scale/zero_point with dimensions of input, so we don't need to do this:
ao/torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py
Lines 247 to 254 in 0ffbac1
| # Reshape scale and zero_point to be compatible with block_size | |
| # This is asserted in IntxUnpackedToInt8Tensor's __init__ | |
| n_blocks = [] | |
| for i in range(len(block_size)): | |
| assert qdata.shape[i] % block_size[i] == 0 | |
| n_blocks.append(qdata.shape[i] // block_size[i]) | |
| scale = scale.reshape(*n_blocks) | |
| zero_point = zero_point.reshape(*n_blocks) |
we could also remove the block_size argument from quantize_affine and dequantize_affine afterwards since the rank are already aligned.
we can also add some docs afterwards to both ops to clarify.