-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR] fix storage rewrite index remap #8338
[TIR] fix storage rewrite index remap #8338
Conversation
@@ -504,8 +503,13 @@ class StoragePlanRewriter : public StmtExprMutator { | |||
return MergeNest(nest, body); | |||
} | |||
// Remap the index | |||
PrimExpr RemapIndex(DataType dtype, PrimExpr index, StorageEntry* e) { | |||
PrimExpr RemapIndex(PrimExpr index, StorageEntry* e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we removed the dtype here? perhaps we can limit the lanes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding the most correct dtype to use is the buffer data type itself. The dtype
param before is load_op->dtype or store_op->dtype from context.
If the buffer dtype has non-trivial lanes like tir.allocate([n], "float32*4")
(I'm not sure if it is possible), it seems not clear which factor to use based on original dtype param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current setup(our code generation), the Load and Store follows the dtype specified by the instruction, regardless of the dtype specified in allocation. So it is better to use dtype in the Node. But you are right that we should not multiply by lanes when calculating the offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right! I checked some mixed dtypes and found that the type casting is done before indexing for load/store. I repush the change which modify just one line of code about lanes, can you kindly review it again?
3a60a37
to
765f9d0
Compare
Thanks @wrongtest |
This is a fix trial for #8308 where the storage rewrite pass remap the index of a vectorized load.
The offset computation should be based on the actual allocated buffer's datatype rather than the load/store op's result datatype.
For example, if a float32*1024 ramp load is relocated to offset
4096b
of a float buffer, the offset expr should be+4096/bits_of(float32)
rather than+4096/bits_of(float32*1024)
cc @vinx13 @masahi