Add BSR subclass +torch.compile and clean up superblock #680

jcaip · 2024-08-14T21:00:31Z

This PR adds in torch.compile support for block sparsity.

In a custom op, we create the sprase_bsr_tensor from the explicit crow_indices, col_indices, values tensors that are passed in to the custom op.

I also created a tensor subclass which holds these same values.

At dispatch, when we see a torch.nn.functional.linear call, we dispatch into our custom op torch.ops.blocksparse.linear, using the tensors stored in the subclass.

This will allow us to add a public API similar to semi_sparse_weight(), which I plan to do in a future PR.

This PR also cleans up the superblock prototype implementation, as there was a lot of repeated code, and also adds in kernel tuning for BSR.

For bfloat16 I see the following numbers, for a 1.23x gain:

New compile baseline: 63.431 ms
New compile + bsr: 53.514 ms 
New compile + bsr + tuning: 51.485 ms

pytorch-bot · 2024-08-14T21:00:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/680

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d14e320 with merge base accbdba ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-08-15T03:22:57Z

torchao/sparsity/prototype/superblock/utils.py

@@ -12,6 +12,25 @@
 import torch
 import torch.distributed as dist

+### IMAGENET UTILS
+@torch.inference_mode
+def benchmark_inference(warmup, iters, f, *args, **kwargs):


why not use

ao/torchao/utils.py

Line 51 in 5998389

def benchmark_model(model, num_runs, args=(), kwargs=None, device_type=None):

? we can add a warmup as well

It was giving me different timings from my function. I'll try and debug the delta later, if I can find out whats causes it I'll switch over.

jcaip added 2 commits August 12, 2024 14:49

add bsr autotuning

7898f93

modified bsr

a1ea2e9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2024

jcaip added 12 commits August 14, 2024 15:48

checkpoint

841e51d

working now

1fc9edf

working now

1256d54

finished

509d3f8

updated

93e60d6

updated

40e70a2

cleaned up benchmark

0085032

added kernel tuning params

08cb2cf

moved benchmark fn

578c608

cleaned

3a82adc

update

c68822f

cleaned up blocksparse

d14e320

jcaip changed the title ~~[wip] add bsr compile support~~ Add BSR subclass and clean up superblock Aug 15, 2024

jcaip changed the title ~~Add BSR subclass and clean up superblock~~ Add BSR subclass +torch.compile and clean up superblock Aug 15, 2024

jerryzh168 reviewed Aug 15, 2024

View reviewed changes

jerryzh168 approved these changes Aug 15, 2024

View reviewed changes

jcaip merged commit b16f0dc into main Aug 15, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BSR subclass +torch.compile and clean up superblock #680

Add BSR subclass +torch.compile and clean up superblock #680

jcaip commented Aug 14, 2024 •

edited

Loading

pytorch-bot bot commented Aug 14, 2024 •

edited

Loading

jerryzh168 Aug 15, 2024

jcaip Aug 15, 2024

Add BSR subclass +torch.compile and clean up superblock #680

Add BSR subclass +torch.compile and clean up superblock #680

Conversation

jcaip commented Aug 14, 2024 • edited Loading

pytorch-bot bot commented Aug 14, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/680

✅ No Failures

jerryzh168 Aug 15, 2024

Choose a reason for hiding this comment

jcaip Aug 15, 2024

Choose a reason for hiding this comment

jcaip commented Aug 14, 2024 •

edited

Loading

pytorch-bot bot commented Aug 14, 2024 •

edited

Loading