You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, we are working on vectorized computation in some tagged storage scope. We encountered the problem on vectorize:
The example is as below:
# before vectorizefor (i: int32, 0, 1024) {
C[i] = ((float32*)A[i] + (float32*)B[(i])
}
# after vectorize (correct now)C[ramp(0, 1, 1024)] = ((float32x1024*)A[ramp(0, 1, 1024)] + (float32x1024*)B[ramp(0, 1, 1024)])
# after storage rewrite (problematic)Merged[ramp(0, 1, 1024)] = ((float32x1024*)Merged[ramp(0, 1, 1024)] + (float32x1024*)Merged[ramp(1, 1, 1024)])
# after storage rewrite (correct version if we do not vectorize)for (i: int32, 0, 1024) {
Merged[i] = ((float32*)Merged[i] + (float32*)Merge[(i+1024)])
}
Here we use tagged storage scope, thus A,B,C can be merge into single buffer Merged and A and C share same region:
A -> Merged[0: 1024]
B -> Merged[1024; 2048]
C -> Merge[0: 1024]
It seems that after buffer merging and buffer index remap, the ramp node is incorrectly rewrite to ramp(origin + 1, 1, 1024) instead of ramp(origin + 1024, 1, 1024)
Hi there, we are working on vectorized computation in some tagged storage scope. We encountered the problem on vectorize:
The example is as below:
Here we use tagged storage scope, thus
A
,B
,C
can be merge into single bufferMerged
andA
andC
share same region:A
->Merged[0: 1024]
B
->Merged[1024; 2048]
C
->Merge[0: 1024]
It seems that after buffer merging and buffer index remap, the ramp node is incorrectly rewrite to
ramp(origin + 1, 1, 1024)
instead oframp(origin + 1024, 1, 1024)
The related implementation seems to be here:
https://github.com/apache/tvm/blob/main/src/tir/transforms/storage_rewrite.cc#L507-L512
where the offset is divided by a factor of datatype lanes (in our case, 1024)
The code to reproduce the problem is as below:
The text was updated successfully, but these errors were encountered: