-
Couldn't load subscription status.
- Fork 185
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am trying to use NF4 real quantization and came across an error because of scales not being divisible by block size. We should add padding to the scales so that we can quantize it using block quant to mitigate this issue. This can be achieved by adding reduce_block_padding function to double_quantization function.
Suggested change
scales = reduce_block_padding(
scales.view(-1), block_sizes={-1: scale_block_size}
)
in the following line of code
Version
nvidia_modelopt == 0.27.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working