Replies: 2 comments
-
This is a cool idea! I think there is a there of optimising things using the “slack” of a par thread. If n is the time taken by the longest running thread, then n-(time take by thread) is defined as the slack of the thread. One way to think about this optimising is being slack-bounded, I.e. being able to do transformations that otherwise won’t make sense. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the writeup, @calebmkim! Just to add a tiny bit more background to this, we talked synchronously about trying to decompose this general problem into two simpler problems:
Thinking of it this way means that:
|
Beta Was this translation helpful? Give feedback.
-
Here is a motivating example for what I'm thinking of:
Suppose we have (I'll represent statements by their static attribute) :
This could turn into:
Even though the seq block we create will take 18 cycles, it doesn't matter, since the
seq{10 10}
will always take 20 cycles.In other words, we can take the latency of the original par block, let's say n.
Then, we have to find some way of partitioning the original par block into the fewest possible number of groups such that each group can be transformed into a seq of pars that will take less than or equal to n cycles to complete.
However, @sampsyo and I already talked and this will probably be difficult to efficiently implement this, so for now I am not going to look for this situation.
I will just take the seq in the original par block with the most statements, and find all of the other seqs in the original par block that can be grouped together with it to turn it into a seq of pars while keeping the number of cycles the same. (edited: wording)
Beta Was this translation helpful? Give feedback.
All reactions