Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IR] Support integer subbyte #403

Merged
merged 4 commits into from
Jan 26, 2024

Conversation

xiaocenxiaocen
Copy link
Collaborator

  • support sub byte integers in Hidet
a = register_tensor("int4b", [4, 4])
b = a[0, 2]
a[2, 2] = int4b(-5)
ptr = &a[0, 0]
ptr = ptr + 8

@yaoyaoding
Copy link
Member

Hi @xiaocenxiaocen, let me know when the PR is ready to be reviewed, thanks!

@yaoyaoding yaoyaoding changed the title [Ir] support integer subbyte [IR] Support integer subbyte Jan 9, 2024
@xiaocenxiaocen
Copy link
Collaborator Author

Hi @xiaocenxiaocen, let me know when the PR is ready to be reviewed, thanks!

Sure. I will work on this in this week and the next week.

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch from a2b7795 to 803b1c2 Compare January 20, 2024 15:47
@xiaocenxiaocen
Copy link
Collaborator Author

Hi, @yaoyaoding. This PR is ready for review. Please take a look at it. Thanks.

@xiaocenxiaocen
Copy link
Collaborator Author

hidet-ci launch

Copy link
Member

@yaoyaoding yaoyaoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiaocenxiaocen for the support of sub-integer type!

It looks good to me overall. And I left some minor suggestions to make some part more consistent with the existing implementation (like the data type).

Feel free to merge this PR by yourself after you resolve those comments.

python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/ir/type.py Outdated Show resolved Hide resolved
python/hidet/transforms/lower_integer_subbyte.py Outdated Show resolved Hide resolved
python/hidet/transforms/lower_integer_subbyte.py Outdated Show resolved Hide resolved
tests/ir/test_int_subbyte.py Outdated Show resolved Hide resolved
tests/ir/test_int_subbyte.py Outdated Show resolved Hide resolved
@xiaocenxiaocen
Copy link
Collaborator Author

$hidet-ci launch

1 similar comment
@hjjq
Copy link
Member

hjjq commented Jan 25, 2024

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch 2 times, most recently from 25e9a56 to 87cc2b7 Compare January 25, 2024 22:42
@xiaocenxiaocen
Copy link
Collaborator Author

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen force-pushed the support-integer-subbyte branch from 87cc2b7 to 7121c88 Compare January 26, 2024 01:23
@xiaocenxiaocen
Copy link
Collaborator Author

$hidet-ci launch

@xiaocenxiaocen xiaocenxiaocen merged commit 8befb62 into hidet-org:main Jan 26, 2024
2 checks passed
vadiklyutiy pushed a commit that referenced this pull request Dec 19, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 20, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
1. Added  `torch.Tensor.as_strided` and `torch.flip`
2. Added support for `rounding_mode == 'trunc'` in torch.divide
3. Registered `torch.new_ones`




Longformer model compilation fails with:
```
RuntimeError: cudaDeviceSynchronize failed with error: cudaErrorMisalignedAddress
```
aftering running `fused_matmul_f16_pk_cute_rearrange_add` kernel. Also
Nvidia Nsight Compute shows that matmul kernel fails to launch. This PR
contains all changes needed to reproduce this issue.

To reproduce:
1. check out to `zhumakhan/longformer` branch and 
4. python3 tests/benchmarks/bench_transformer.py longformer

---------

Co-authored-by: Zhumakhan <nazirzhumakhan@gmail,.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants