-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-cycle transitions from dynamic to static control #1828
Comments
The two relevant issues here both have to do with the The high-level problem is that, in general, we cannot transition from Another question worth asking is that can transitions out of static islands be zero cycle as well? My guess is that the current answer is no because we generate a dynamic wrapper for static islands which means we adopt the 1-cycle delay in dynamic-to-dynamic transitions. |
Yeah, I think this is right. Although to rephrase slightly, we are technically giving
I think this goes back to the previous comment. Suppose we have One thing I'm realizing is that, if we do this, maybe we should move the wrapping logic from |
This was thinking too but this is a fact about dynamic -> dynamic transitions too. It seems that maybe we can unconditionally guarantee that transitions from dynamic to static will be zero cycle which might be enough for what the YXI team needs.
Yeah, this makes sense. There is a lot of optimizations here with optimizing wrapper generation and transitions that we haven't explored. I think these could be high-impact because FSMs are likely the worst offenders in terms of resource blow up. |
Really good points here. To summarize, we believe it is possible for the compiler to provide zero-cycle transitions in
I think one question we'd have to answer for option 2 would be: what does the surface-level, syntactic rule look like that implies this guarantee? It could just be this:
I don't like this very much because it means that even a dynamic An alternative (which would be pretty heavyweight) would be to introduce a new, special control operator just for this purpose. |
The opportunistic stuff is an optimization so I won't say much about it. You're right about the trade-offs: no semantic guarantees and therefore not particularly useful timing-precise reasoning. I was thinking of a rule of the form that "entering a static control program is guaranteed to be zero-cycle". However, I agree with you that this is putting constraints on the way dynamic scheduling works. Unfortunately, I don't see a way out of it: if we want to provide guarantees between the timing interactions between dynamic and static control, we'll have to make statements about the scheduling policies for dynamic operators. We can try adding attributes but that doesn't really address the problem. |
I suppose my high-level point about the dilemma here is that we probably want to support both modes. That is, |
How do you feel about making the "slow transition" behavior opt-in in the same we've been discussing adding a new attribute to make I'm hesitant of introducing a new control operator with these semantics because then we have three different kinds of sequencing and I don't know what the guideline for which is a "good default" will be. If we already know that the good default is "fast transition", then let's make |
It could work! Yeah, I think what's confusing here is that I also don't know what the right default is: schedulable vs. save-a-cycle. There is a reasonable argument to be made that "schedulable" is the right default because it reflects "normal" computations that don't need to interface with the outside world. Especially if paired with a best-effort optimization that saves a cycle in most cases anyway? |
I think defining which computations potentially interact with the outside world will become increasingly difficult, especially in presence of I would also reframe "save a cycle" to something like "dynamic to static transition guarantee" to imply that this isn't just an optimization; it's a guarantee of the compiler that user-level programs can rely on. |
During a synchronous discussion, we were trying to generalize the problem expressed in the original post: in other words, come up with other situations in which we need cycle-precise behavior for control between the static/dynamic boundary. For example, one scenario we were thinking of was the following If we had something like:
What if we need the backwards edge from dynamic child -> static child to be a 0 cycle transition? @nathanielnrn can you think of any other scenarios in the yxi project where you would need 0 cycle transition behavior, besides the scenario we have already identified? |
It seems to me like the dynamic |
@nathanielnrn I think Caleb is asking if there are specific, concrete examples you can think of. For example, one possible example is a burst transfer where, once the
|
@calebmkim and I spoke synchronously about this. The scenarios we came up with are dynamic A reduced example would be something like
If we had some guarantee on the transition between the end of |
This is great; thanks for the synopsis. And interesting that this example contains both static->dynamic and dynamic->static transitions! |
Another bump on this issue because I'd like to make progress on this. @calebmkim thoughts on taking the next open slot in the Calyx meeting to discuss this issue and maybe propose concrete implementation strategy? |
Just took the next open slot |
Based on the synchronous discussion today, it seems like we probably do not want to introduce a zero-cycle transition as a guarantee of the language, as it would infect the scheduling of seqs and would prevent things like compaction. The option that we had in mind was to add a minimally expressive extension as an attribute: we can call it
A->B->C->D must either be Static->Dynamic->Static->Dynamic or Dynamic->Static->Dynamic->Static. From the synchronous discussion, it seems like the specificity is the point. We normally don't want to use this, but in case you have some specific timing constraints that you need to be met, then we will offer this special attribute to you. @rachitnigam do you have thoughts? |
We are limiting this to the alternating style because any non-alternating case can just be crafted using the alternating style (above) plus calls to appropriate control operators. Say you need:
where Too bad, you must write:
This is neat because it crisps up what the promise of |
Perhaps silly, but do we need to do anything to prevent:
? Are static groups with latency 0 even allowed? |
Zero-cycle groups or static groups are not allowed. If a program provides a zero-cycle group, it leads to undefined behavior (miscompilation, specifically) |
I remain iffy about attributes as a mechanism to add correctness guarantees to the program. Specifically, if the compiler does not guarantee that it'll preserve or even respect attributes, I feel that the default behavior should be the one that preserves correctness.
I'm not convinced about this yet. Compaction only works for |
CC @sampsyo on this last comment because I think we disagree on this point and maybe I'm missing something obvious |
@sampsyo can also give his own answer, but here is a minimum example of where I think the language-level guarantee would affect our compaction abilities. Suppose we have the following
In this case, If the |
So this is what I don't understand. If A does not have any dependency on Does that make sense? The rule is only meaningful when there is a dependency between children. If there is no dependency, the rule does not matter. |
Ah, I see what you're saying. I still think there are some tricky cases here, though. For example, suppose we have:
In this case, compaction would schedule them like this:
Which would break the 0-cycle transition. B would execute from One "solution" would be to treat the static-dynamic pair as a "package" which must be thought of as a single entity when performing compaction, but a) this still limits compaction a bit and b) we still need to think about what the language-level guarantee would be. Another "solution" could be that the 0-cycle transition only occurs if the length of the |
Ah, this is a good example! Note that as-late-as-possible (ALAP) schedule will still be able to satisfy the requirement of zero-cycle transition from B -> The relevant question is: are all schedules (ASAP, ALAP, etc.) about the same hardware cost when performing compacting? If yes, then maybe the freedom of scheduling choices is irrelevant; if not, let's see why? |
Truly awesome; thanks for making this concrete, @calebmkim. That issue where the compiler would like to add latency between @rachitnigam, in this example, you're right that an ALAP schedule could still respect 0-cycle transitions from dynamic to static. I think, however, that there is a complementary counter-example for ALAP if we want to support the opposite direction too (0-cycle transitions from static to dynamic). Slightly modifying @calebmkim's example:
Now ALAP violates the 0-cycle guarantee and you want ASAP in order to respect it. To make things worse, consider a static/dynamic/static "sandwich":
Now, I think, there is no choice of when to run To return from specifics to generalities: it seems like it is very valuable for a rescheduling optimizer to have lots of freedom to insert delays between group activations, not just to delete them. Inserting them can sometimes (paradoxically?) let you reduce delays elsewhere, and it can certainly allow area/latency trade-offs by reducing parallelism. So it would be a shame to have |
@sampsyo okay, these examples are a nail in the coffin of the whole "language-level guarantee" pitch I was trying to go for. I think the attribute approach is fine for now and we can eventually consider a |
Cool. That's a good plan… I share your aesthetic discomfort with semantics-bearing attributes, so I dig the idea of starting with one just because it's simpler to implement and then probably turning that into a proper qualifier. (I like the spelling |
Okay, I'm going to mark this as "available" then! The plan is to implement the |
Who should I discuss the syntax/semantics of |
The issue describes the high-level stuff. Are there specific questions you have? |
Hi Ethan-- thanks for picking this up. The most relevant comments regarding implementation are my comment on Feb 5 and Anshuman's comment on Feb 6. You will probably have to edit tdcc ( |
Thanks, Caleb! Are you free today to discuss this? |
Sure-- I'm pretty free this afternoon, we can DM to find a time |
To slightly expand on this:
I actually think this is a super interesting problem that has implications for static/dynamic Calyx assignments. Artistically, we would like to be able to manage an AXI transaction with a control program like this:
The problem is that AXI, of course, is super cycle-timing-sensitive. Remember that one of the core invariants is:
So the remote side can schedule these transaction cycles back-to-back, i.e., 3 transactions can happen in exactly 3 cycles. Most saliently for us, say the remote side just keeps its
valid
signal high all the time and relies on us to assert/deassertready
to do transfers. Then, if we need to take some time to process a transfer, we have to be absolutely sure we only assertready
for one cycle at a time.The problem with dynamic Calyx is that it offers no guarantees about how long it takes to go from
wait_for_valid
above todeassert_ready
. And in practice, it actually does waste at least one cycle, meaning that the remote side sees us do two transfers when we only wanted one.The big upshot of static Calyx is that it does offer these guarantees, so we would like to be able to just sprinkle the
static
everywhere get the timing we want. There is a critical problem, however: thewait_for_valid
group is fundamentally dynamic. Our "dynamic islands" approach does not support sequencing a dynamic group with a static one with a zero-cycle transition latency.So the big design question that I think this opens up is: is there some rule we can make up that would allow this kind of sequencing? That is, to allow some version of
seq { a ; b }
wherea
is dynamic andb
is static where we can guarantee thatb
runs in the same cycle wherea
says it is done. It seems hard to do in general but maybe there is something we can do for restricted cases?Originally posted by @sampsyo in #1820 (comment)
The text was updated successfully, but these errors were encountered: