Idea for finding more situations to transform a Par of Seqs into Seq of Pars #1030

calebmkim · 2022-06-13T16:16:54Z

calebmkim
Jun 13, 2022
Collaborator

Here is a motivating example for what I'm thinking of:

Suppose we have (I'll represent statements by their static attribute) :

par {
    seq{ 2 4 8 1 }
    seq{ 2 7 5} 
    seq {10 10} 
}

This could turn into:

par {
    seq{
        par {2 2}
        par {4 7}
        par {8 5}
        1
    }
    seq {10 10} 
}

Even though the seq block we create will take 18 cycles, it doesn't matter, since the seq{10 10} will always take 20 cycles.

In other words, we can take the latency of the original par block, let's say n.
Then, we have to find some way of partitioning the original par block into the fewest possible number of groups such that each group can be transformed into a seq of pars that will take less than or equal to n cycles to complete.

However, @sampsyo and I already talked and this will probably be difficult to efficiently implement this, so for now I am not going to look for this situation.

I will just take the seq in the original par block with the most statements, and find all of the other seqs in the original par block that can be grouped together with it to turn it into a seq of pars while keeping the number of cycles the same. (edited: wording)

rachitnigam · 2022-06-13T22:49:02Z

rachitnigam
Jun 13, 2022
Maintainer

This is a cool idea! I think there is a there of optimising things using the “slack” of a par thread. If n is the time taken by the longest running thread, then n-(time take by thread) is defined as the slack of the thread. One way to think about this optimising is being slack-bounded, I.e. being able to do transformations that otherwise won’t make sense.

0 replies

sampsyo · 2022-06-15T21:49:01Z

sampsyo
Jun 15, 2022
Maintainer

Thanks for the writeup, @calebmkim! Just to add a tiny bit more background to this, we talked synchronously about trying to decompose this general problem into two simpler problems:

A basic version that only inverts one-level par/seq nests. That is, it takes programs of the form par { seq { ... }; seq { ... }; ... } and rearranges that level only into seq { par { ... }; par { ...}; ... }. The original snippet must have 1 par wrapping $N$ seqs; the transformed snipped must have exactly 1 seq wrapping $M$ pars.
A partitioning version that takes a par/seq nest and breaks it into several "chunks" and then invokes the basic version on each. That is, say you have par { S1 ; S2 ; S3 } where each $S_i$ is a seq block. We can first break this into the equivalent program par { par { S1 ; S2 }; S3 }, which entails partitioning of the set of seqs. Now we can just run the algorithm from bullet 1 above on the inner par block we just created.

Thinking of it this way means that:

We can think carefully about optimality for both problems.
We can do a good job in the first/simpler case first without worrying about the second case.
When we do improve the second case, it can reuse the implementation for the first case (it need not reinvent that machinery).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Calyx Infrastructure

Idea for finding more situations to transform a Par of Seqs into Seq of Pars #1030

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

The Calyx Infrastructure

Idea for finding more situations to transform a Par of Seqs into Seq of Pars #1030

calebmkim Jun 13, 2022 Collaborator

Replies: 2 comments

rachitnigam Jun 13, 2022 Maintainer

sampsyo Jun 15, 2022 Maintainer

calebmkim
Jun 13, 2022
Collaborator

rachitnigam
Jun 13, 2022
Maintainer

sampsyo
Jun 15, 2022
Maintainer