You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But, because the Reduction Domain is mapped to the innermost loop, the load operation inside the innermost loop accesses outside the tile, as shown below.
produce s32Index:
for z.z_outer:
for y.y_outer:
for x.x_outer:
for z.z_inner in [0, 3]:
for y.y_inner in [0, 3]:
for x.x_inner in [0, 3]:
produce argmaxResult:
argmaxResult(...) = ...
for r:
argmaxResult(...) = ...
consume argmaxResult:
s32Index(...) = ...
To prevent this, I tried to move the "r" loop outside of "z_inner" using .reorder() on argmaxResult.update(), but it did not work(The compile was OK but the order was not changed). I also thought about including "r" in the tiles with splitting, but ”r” is not visible from "s32Index", so I don't know how to write it.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I tried adding tiling to the argmax operation as follows.
But, because the Reduction Domain is mapped to the innermost loop, the load operation inside the innermost loop accesses outside the tile, as shown below.
To prevent this, I tried to move the "r" loop outside of "z_inner" using .reorder() on argmaxResult.update(), but it did not work(The compile was OK but the order was not changed). I also thought about including "r" in the tiles with splitting, but ”r” is not visible from "s32Index", so I don't know how to write it.
Does anyone have a solution to this issue?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions