Align `_choose_qparams_affine ` with `_choose_scale_float8`

In `_choose_qparams_affine` (for int):

we use `keepdim=False`: https://github.com/pytorch/ao/blob/b4ec4cbab799b5320fc1ca094e0398b3ad967dbf/torchao/quantization/quant_primitives.py#L1552-L1553 that will make per tensor quantization to have a scalar scale automatically.

I think we can change `_choose_qparams_affine` to use `keepdim=True` and make sure we match the scale/zero_point with dimensions of input, so we don't need to do this: https://github.com/pytorch/ao/blob/0ffbac1a0db13701d40c2226c9643733c4705877/torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py#L247-L254

we could also remove the `block_size` argument from `quantize_affine` and `dequantize_affine` afterwards since the rank are already aligned.


we can also add some docs afterwards to both ops to clarify.

	# Reshape scale and zero_point to be compatible with block_size
	# This is asserted in IntxUnpackedToInt8Tensor's __init__
	n_blocks = []
	for i in range(len(block_size)):
	assert qdata.shape[i] % block_size[i] == 0
	n_blocks.append(qdata.shape[i] // block_size[i])
	scale = scale.reshape(*n_blocks)
	zero_point = zero_point.reshape(*n_blocks)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align `_choose_qparams_affine` with `_choose_scale_float8` #3324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	min_val = torch.amin(input, dim=reduction_dims, keepdim=False)
	max_val = torch.amax(input, dim=reduction_dims, keepdim=False)

Align _choose_qparams_affine with _choose_scale_float8 #3324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Align `_choose_qparams_affine` with `_choose_scale_float8` #3324