[BUG] Issue processing NF4 double quantization

## Describe the bug

I am trying to use NF4 real quantization and came across an error because of scales not being divisible by block size. We should add padding to the scales so that we can quantize it using block quant to mitigate this issue. This can be achieved by adding `reduce_block_padding` function to `double_quantization` function. 

## Suggested change
```
scales = reduce_block_padding(
    scales.view(-1), block_sizes={-1: scale_block_size}
)
```
in the following line of [code](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/quantization/qtensor/nf4_tensor.py#L128)

## Version

nvidia_modelopt == 0.27.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] Issue processing NF4 double quantization #183

Describe the bug

Suggested change

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Issue processing NF4 double quantization #183

Description

Describe the bug

Suggested change

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions