You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am quite new to halide and used to schedule my algorithm manually.
I managed to reach similar level of scheduling in halide with many examples but not with this one.
I am trying to implement the following integral+transpose function:
Most likely I would like to add some vectorization on the x level but the most important goal is to reduce memory accesses and fuse the reduction stage with the in/out copies.
In all my trials, I ended up with full copy from input to integral buffer + read&write from integral buffer per each index + full copy to output
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am quite new to halide and used to schedule my algorithm manually.
I managed to reach similar level of scheduling in halide with many examples but not with this one.
I am trying to implement the following integral+transpose function:
The code genration I would like to acheive is somthing like this:
Most likely I would like to add some vectorization on the x level but the most important goal is to reduce memory accesses and fuse the reduction stage with the in/out copies.
In all my trials, I ended up with full copy from input to integral buffer + read&write from integral buffer per each index + full copy to output
Any ideas on how to write the desired schedule?
Thanks for any suggestions.
Beta Was this translation helpful? Give feedback.
All reactions