Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Aug 18, 2025

This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format.

Key features:

  • Implements tensor core tiled packing for efficient computation on tensor cores
  • Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2
  • Optimized for tinygemm int4mm kernel (_weight_int4pack_mm)
  • Includes comprehensive test suite

The implementation follows the same pattern as other int4 tensor subclasses but uses
a specialized packing format optimized for tensor core matrix multiplication performance.

Changes:

  • Add Int4TilePackedTo4dTensor implementation
  • Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format
  • Add TILE_PACKED_TO_4D to PackingFormat enum
  • Add comprehensive tests including serialization, different group sizes, and error conditions
  • Update init.py files to export new tensor class

Test:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2791

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 22b937f with merge base 15a6de6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2025
@jerryzh168 jerryzh168 changed the title Add Int4TensorCoreTilePackedTensor for tensor core tiled int4 quantiz… Add Int4TensorCoreTilePackedTensor Aug 18, 2025
@jerryzh168 jerryzh168 added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 18, 2025
@jerryzh168 jerryzh168 requested a review from Xia-Weiwen August 18, 2025 21:45
@jerryzh168 jerryzh168 force-pushed the int4wo-tinygemm branch 2 times, most recently from ed800a1 to b878ee5 Compare August 20, 2025 03:08
@jerryzh168 jerryzh168 force-pushed the int4wo-tinygemm branch 7 times, most recently from 1be7de0 to 4046cc0 Compare August 26, 2025 01:02
@jerryzh168 jerryzh168 changed the title Add Int4TensorCoreTilePackedTensor Add Int4TilePackedTo4dTensor Aug 26, 2025
"""
tile_packed_to_4d is referring to the format used by tensor core tiled kernels for int4 quantization
"""
TILE_PACKED_TO_4D = "tile_packed_to_4d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but in a separate PR would be good to delete this enum since we determined that PLAIN is the only format which is reused, and all the others are tensor-specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also want to delete global PackingFormat as well right?

)

original_shape = hp_tensor.shape
# use a fixed value to simplify api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

@jerryzh168 jerryzh168 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use fixed inner_k_tiles here to have a shorter arg list, I didn't see people change it anywhere

@jerryzh168 jerryzh168 force-pushed the int4wo-tinygemm branch 4 times, most recently from 6411b23 to b2885aa Compare August 27, 2025 20:42
@jerryzh168 jerryzh168 requested review from metascroy and vkuzo August 28, 2025 20:17
… 4d packing

This commit introduces Int4TilePackedTo4dTensor, a new tensor subclass for int4 weight-only quantization using tensor core tiled packing format.

Key features:
- Implements tensor core tiled packing for efficient computation on tensor cores
- Supports PackingFormat.TILE_PACKED_TO_4D in Int4WeightOnlyConfig version 2
- Optimized for tinygemm int4mm kernel (_weight_int4pack_mm)
- Includes comprehensive test suite

The implementation follows the same pattern as other int4 tensor subclasses but uses
a specialized packing format optimized for tensor core matrix multiplication performance.

Changes:
- Add Int4TilePackedTo4dTensor implementation
- Update Int4WeightOnlyConfig version 2 to support TILE_PACKED_TO_4D packing format
- Add TILE_PACKED_TO_4D to PackingFormat enum
- Add comprehensive tests including serialization, different group sizes, and error conditions
- Update __init__.py files to export new tensor class

Test:
python test/quantization/quantize_/workflows/int4/test_int4_tile_packed_to_4d_tensor.py
@jerryzh168 jerryzh168 merged commit 6176322 into pytorch:main Aug 29, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants