-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IR] Support integer subbyte #403
[IR] Support integer subbyte #403
Conversation
xiaocenxiaocen
commented
Jan 2, 2024
- support sub byte integers in Hidet
Hi @xiaocenxiaocen, let me know when the PR is ready to be reviewed, thanks! |
Sure. I will work on this in this week and the next week. |
a2b7795
to
803b1c2
Compare
Hi, @yaoyaoding. This PR is ready for review. Please take a look at it. Thanks. |
hidet-ci launch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xiaocenxiaocen for the support of sub-integer type!
It looks good to me overall. And I left some minor suggestions to make some part more consistent with the existing implementation (like the data type).
Feel free to merge this PR by yourself after you resolve those comments.
$hidet-ci launch |
1 similar comment
$hidet-ci launch |
25e9a56
to
87cc2b7
Compare
$hidet-ci launch |
87cc2b7
to
7121c88
Compare
$hidet-ci launch |
1. Added `torch.Tensor.as_strided` and `torch.flip` 2. Added support for `rounding_mode == 'trunc'` in torch.divide 3. Registered `torch.new_ones` Longformer model compilation fails with: ``` RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress ``` aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR contains all changes needed to reproduce this issue. To reproduce: 1. check out to `zhumakhan/longformer` branch and 4. python3 tests/benchmarks/bench_transformer.py longformer --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
1. Added `torch.Tensor.as_strided` and `torch.flip` 2. Added support for `rounding_mode == 'trunc'` in torch.divide 3. Registered `torch.new_ones` Longformer model compilation fails with: ``` RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress ``` aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR contains all changes needed to reproduce this issue. To reproduce: 1. check out to `zhumakhan/longformer` branch and 4. python3 tests/benchmarks/bench_transformer.py longformer --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
1. Added `torch.Tensor.as_strided` and `torch.flip` 2. Added support for `rounding_mode == 'trunc'` in torch.divide 3. Registered `torch.new_ones` Longformer model compilation fails with: ``` RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress ``` aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR contains all changes needed to reproduce this issue. To reproduce: 1. check out to `zhumakhan/longformer` branch and 4. python3 tests/benchmarks/bench_transformer.py longformer --------- Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>