Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fixbug] Binary arthmatic ops raise error when one is scalar on GPU #109

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

yaoyaoding
Copy link
Member

Fix #95.

@yaoyaoding yaoyaoding merged commit 7820672 into hidet-org:main Feb 17, 2023
@yaoyaoding yaoyaoding deleted the fix-arth branch February 17, 2023 02:40
vadiklyutiy pushed a commit that referenced this pull request Jul 22, 2024
- Add tile-level operations like, `copy`, `mask`, `partition_src`, and
`partition_dst`.
- Add a pass to lower the tile-level operations to Hidet IR. 
- Enhance the infra to facilitate the lowering. 

`copy`: Copy a tensor to another tensor. 
`mask`: Create a mask tensor for the copy operation. Typically, this
operation is used when the tile size can not divide the matrix shape.
`partition_src`, `partition_dst`: These two operations partition a
tensor held by the entire thread block into subtensors held by a single
thread. These operations allow us to move expressions related to
`threadIdx` and `blockIdx` outside the loop.

---------

Co-authored-by: Xiao Zhang <xiao.zhang@centml.ai>
vadiklyutiy pushed a commit that referenced this pull request Jul 23, 2024
- Add tile-level operations like, `copy`, `mask`, `partition_src`, and
`partition_dst`.
- Add a pass to lower the tile-level operations to Hidet IR. 
- Enhance the infra to facilitate the lowering. 

`copy`: Copy a tensor to another tensor. 
`mask`: Create a mask tensor for the copy operation. Typically, this
operation is used when the tile size can not divide the matrix shape.
`partition_src`, `partition_dst`: These two operations partition a
tensor held by the entire thread block into subtensors held by a single
thread. These operations allow us to move expressions related to
`threadIdx` and `blockIdx` outside the loop.

---------

Co-authored-by: Xiao Zhang <xiao.zhang@centml.ai>
vadiklyutiy pushed a commit that referenced this pull request Dec 26, 2024
- Add tile-level operations like, `copy`, `mask`, `partition_src`, and
`partition_dst`.
- Add a pass to lower the tile-level operations to Hidet IR. 
- Enhance the infra to facilitate the lowering. 

`copy`: Copy a tensor to another tensor. 
`mask`: Create a mask tensor for the copy operation. Typically, this
operation is used when the tile size can not divide the matrix shape.
`partition_src`, `partition_dst`: These two operations partition a
tensor held by the entire thread block into subtensors held by a single
thread. These operations allow us to move expressions related to
`threadIdx` and `blockIdx` outside the loop.

---------

Co-authored-by: Xiao Zhang <xiao.zhang@centml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] binary arithmetic with CUDA scalar
1 participant