-
Notifications
You must be signed in to change notification settings - Fork 352
Add Int4TilePackedTo4dTensor #2791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2791
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 22b937f with merge base 15a6de6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
0b18c16 to
e73434e
Compare
ed800a1 to
b878ee5
Compare
test/quantization/quantize_/workflows/int4/test_int4_tensor_core_tile_packed_tensor.py
Outdated
Show resolved
Hide resolved
torchao/quantization/quantize_/workflows/int4/int4_tensor_core_tile_packed_tensor.py
Outdated
Show resolved
Hide resolved
test/quantization/quantize_/workflows/int4/test_int4_tensor_core_tile_packed_tensor.py
Outdated
Show resolved
Hide resolved
b878ee5 to
1922aaf
Compare
torchao/quantization/quantize_/workflows/int4/int4_tensor_core_tile_packed_tensor.py
Outdated
Show resolved
Hide resolved
1be7de0 to
4046cc0
Compare
4046cc0 to
f0b97a6
Compare
| """ | ||
| tile_packed_to_4d is referring to the format used by tensor core tiled kernels for int4 quantization | ||
| """ | ||
| TILE_PACKED_TO_4D = "tile_packed_to_4d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but in a separate PR would be good to delete this enum since we determined that PLAIN is the only format which is reused, and all the others are tensor-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also want to delete global PackingFormat as well right?
| ) | ||
|
|
||
| original_shape = hp_tensor.shape | ||
| # use a fixed value to simplify api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fixed inner_k_tiles here to have a shorter arg list, I didn't see people change it anywhere
torchao/quantization/quantize_/workflows/int4/int4_tile_packed_to_4d_tensor.py
Outdated
Show resolved
Hide resolved
6411b23 to
b2885aa
Compare
b2885aa to
99bdf5d
Compare
… 4d packing This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format. Key features: - Implements tensor core tiled packing for efficient computation on tensor cores - Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2 - Optimized for tinygemm int4mm kernel (_weight_int4pack_mm) - Includes comprehensive test suite The implementation follows the same pattern as other int4 tensor subclasses but uses a specialized packing format optimized for tensor core matrix multiplication performance. Changes: - Add Int4TilePackedTo4dTensor implementation - Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format - Add TILE_PACKED_TO_4D to PackingFormat enum - Add comprehensive tests including serialization, different group sizes, and error conditions - Update __init__.py files to export new tensor class Test: python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
99bdf5d to
22b937f
Compare
This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format.
Key features:
The implementation follows the same pattern as other int4 tensor subclasses but uses
a specialized packing format optimized for tensor core matrix multiplication performance.
Changes:
Test:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py