Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining the static interface for calyx components #1725

Closed
paili0628 opened this issue Sep 18, 2023 · 20 comments
Closed

Determining the static interface for calyx components #1725

paili0628 opened this issue Sep 18, 2023 · 20 comments
Labels
C: Calyx Extension or change to the Calyx IL Calyx 2.0 Things that move us towards Calyx 2.0

Comments

@paili0628
Copy link
Contributor

We should make a decision on exactly what the interface for the static component would look like. In particular, there are 3 options:

  1. have a go signal that triggers for exactly 1 cycle. The component will be 'ready' to accept inputs again after n cycles (i.e., the static latency refers to the II). (No done signal)
  2. have neither a go nor a done interface. The output ports hold the outputs relevant to the value on the input ports exactly n cycles ago. In this case the component will have to be 'ready' to accept inputs every cycle, since there's no way to distinguish between valid and invalid signals.
  3. have a go signal that has to be continuously triggered until a done signal is up (the static latency refers to how many cycles have to elapse since the first cycle the go signal is triggered until the done signal is up).

Correspondingly, there are also two options to define the dynamic interface:

  1. The classic calyx go-done interface with no pipelining.
  2. a go-ready-done interface, where the go still has the same semantic meaning, the done means that a valid output is produced on the output ports, and the ready means that the component is now 'ready' to accept inputs (i.e., the component only 'accepts' inputs when go==1'd1 & ready==1'd1.
@calebmkim
Copy link
Contributor

calebmkim commented Sep 20, 2023

Thanks for writing this up, one small question on 2. (Edit: actually I think we should probably just abandon 2 as an idea, so my question isn't really relevant):

  1. have neither a go nor a done interface. The output ports hold the outputs relevant to the value on the input ports exactly n cycles ago. In this case the component will have to be 'ready' to accept inputs every cycle, since there's no way to distinguish between valid and invalid signals.

Is it necessary for the component to be ready to accept inputs every cycle? In other words, can't we just make it the Calyx programmer's responsibility to use the component correctly if the component's II is not 1? The only reason why I say this is that it seems like we wouldn't be able to represent a component that has an II of 2 or more in this case. Lmk if there's something I'm missing.

Another question:
Is there another option that is kind of a mix of 1. and 3.? More concretely, the n in static<n> component_name in this option represents the latency of the component as in 3., but there is also no done signal for the component like in 1. This is just my opinion so feel free to disagree, but in some ways I feel it's the most intuitive option since it matches Calyx groups most closely (e.g., static <n> group_name means that the group's latency is n cycles, and it does not have a done signal).

@rachitnigam
Copy link
Contributor

I think (2) is not tenable because if your module has II > 1, then you cannot correctly use it because you have no way to know when it started its first execution.

@calebmkim
Copy link
Contributor

calebmkim commented Sep 25, 2023

Just to summarize our synchronous discussion from today:

What we want: a one cycle go assertion

static<n> component comp means we can assert comp.go for one cycle, and then get comp's outputs after n cycles. (no done signal)

Asymmetry with groups

A static<n> group g is a bit different: if you want to run the group to completion, you have to assert g[go] for n cycles. We are fairly certain that groups and components should have analogous interfaces: therefore, if we want to have a "one cycle go assertion" for components, then we should probably have it for groups as well. Therefore, we have to change the way we compile static groups.

The main difficulty is probably going to be compiling repeat statements.
If we have:

static<n> group g {...}; 
static repeat 10 {
  g; 
} 

We currently compile it like this:

static<n * 10> group repeat_group {
  g[go] = 1'd1; 
} 

However, for the "one cycle go assertion" interface, we would have to compile it like this:

static<n * 10> group repeat_group {
  g[go] = %0 | %10 | %20 ... | %90 ? 1'd1; 
} 

There may also be other issues that I have forgotten, so please add them in responses here.

@sampsyo
Copy link
Contributor

sampsyo commented Sep 27, 2023

Excellent summary. I think that's exactly where we stand. I guess the strawman plan here could be to just live with the asymmetry between components and groups... do folks know of bad problems with that? Maybe inlining?

@calebmkim
Copy link
Contributor

Decision

Just going to update this with our synchronous discussion: we decided that the asymmetry between groups and components is (probably) acceptable, since there is an asymmetry in between how components and groups are used: components can be pipelined in their execution, while groups cannot. Therefore, we can (at least for now) try to implement the interface and we'll live with the asymmetry.

Question

One question that I just thought of: currently, we infer the latency of registers and multipliers by looking at the @static annotation on their go port, even though they have a done port. This @static annotation is just a way of saying "we know the output will be available after one cycle"; kind of like a hint for the compiler. Couple questions on this:

  1. Should we keep this behavior, in light of our decision about the static interface components? If we do, there will be a distinction between two things: 1) the actual static<n> interface (which is to trigger go for one cycle, wait n cycles, no done signal) and 2) the @static(n) annotation, which is just a hint to promote to static.
  2. What should we do about dynamic components whose entire control program we can infer the latency of? We probably won't be able to upgrade the component to a static<n> component (since that would involve getting rid of the done signal), but we may be able to attach the @static(n) annotation "hint" to its ``go'' port. This will help in the static-promotion of groups.

@rachitnigam
Copy link
Contributor

Thanks for following up on this @calebmkim! Thoughts on "definitely static" primitives like registers:

  1. On the elegance front, I think it would be nice to say something like: all registers are completely static, a fact which is reflected in their primitive because use the static keyword. They do not have the done signal and the only way to use them is in the context of static groups. The Calyx compiler knows how to instantiate the done signal for such groups and will do the right thing.
  2. On the pragmatic front: we can keep the done signal for the register and say, "well actually, because it is so common to use registers in either context, we have implemented a really efficient done signal for the register and the compiler is allowed to assume that the done interface works correctly and can be eliminated in static contexts".

My point in mentioning (1) & (2) is that, notionally, we should think of static primitives as truly static: they do not have need or have a done signal. The fact that they have a done signal is merely an optimization that's available for the compiler to leverage. It should be totally okay to ignore the done signal and instantiate a static group FSM to expose the done signal as well because first and foremost, std_reg is a static primitive.

Does this philosophy resonate?

@rachitnigam rachitnigam added C: Calyx Extension or change to the Calyx IL Calyx 2.0 Things that move us towards Calyx 2.0 labels Oct 3, 2023
@rachitnigam
Copy link
Contributor

@andrewb1999 agrees that the new "assert go for one cycle" interface is a good idea and so we should commit to implementing it. Once it lands, Andrew will update AMC to use the new interface and hopefully things will continue working.

@sampsyo
Copy link
Contributor

sampsyo commented Oct 4, 2023

My point in mentioning (1) & (2) is that, notionally, we should think of static primitives as truly static: they do not have need or have a done signal.

Right, this seems like the way to me! To put it another way, we want to have two nonoverlapping categories of components:

  1. Dynamic components, which can only be used in a dynamic context (and have a done port).
  2. Static components, which can only be used in a static context (and do not have a done port). (But you can wrap them to use them in a dynamic world!)

Then we might want to create a small third category for special cases:

  1. Stuff like std_reg that can be used in either context, without wrapping. These have both a known static latency and a done port. When used in a static context, you just ignore the done port completely.

One could argue that we should eliminate category 3, and create two different primitives called static_reg and dyn_reg for example. But it doesn't seem too bad to have a small number of things in category 3, as long as we don't make that the common case!

@rachitnigam
Copy link
Contributor

Right! The argument for keeping category 3 is that name punning for very common things like registers is useful and we expect synthesis tools to remove the unconnected done register when used in static context anyways.

@calebmkim
Copy link
Contributor

calebmkim commented Oct 4, 2023

One more note: for static-promotion, it seems like we would be promoting some components from (1) to (3). Is this fine? I know we should probably keep (3) small; on the other hand, this could help with static promotion. I think it's probably fine since this is just internal to the compiler?

@rachitnigam
Copy link
Contributor

Hm, not sure I follow. If you promote something, it goes from 3 to 1 right? This is because we're saying that we definitely don't need to use the done signal

@paili0628
Copy link
Contributor Author

paili0628 commented Oct 5, 2023

I think we have to rethink what the latency of a static component actually means, especially for pipelined components.

Writing a pipelined component would probably have a control program that looks something like this:

static par {
  static seq {// 1st stage of pipeline } 
  static seq {// 2nd stage of pipeline } 
  ... // etc. 
} 

But in this case, the latency of the control program would be the II, not the latency of the component.

Using a static seq, i.e., something like this:

static seq {
  static par {// 1st stage of pipeline } 
  static par {// 2nd stage of pipeline } 
  ... // etc. 
} 

won't work, since we only have one fsm, which can only be at one value: intuitively, this means we cannot be executing multiple stages of the pipeline at the same time: we are only executing the stage that the fsm register tells us.

Proposal

It might be good to keep the current decision, except for static<n> component, n is the II and not the latency: these two numbers are the same for non-pipelined components, so this only makes a difference for pipelined components.

Doing this would also create somewhat of an analogy with static Calyx control programs: for example, the static systolic array has a while loop: the while loop's body has latency of 1, since that's the "II" of the loop, not the number of cycles to perform a multiply & accumulate.

This won't necessarily change the way we are going to compile static components, it will only affect how people should think about the latency of static components.

@rachitnigam
Copy link
Contributor

@paili0628 My understanding of the annotation is exactly the same! That is, for sequential components, the annotation represents both the II and the latency but for pipelined component, it tells you when you can re-invoke the module.

@calebmkim thoughts?

@calebmkim
Copy link
Contributor

Yeah I agree with the above^. I think this all sounds good.

@sampsyo
Copy link
Contributor

sampsyo commented Oct 6, 2023

I think this II/latency decision sounds fine too! Namely, it fulfills these desiderata that seem important to me:

  • For non-pipelined components, a simple invoke foo suffices to be sure that the outputs from foo are done after the invocation. (Because II=latency, as @rachitnigam noted.)
  • For pipelined components, you have to worry about two things: (a) when you can run stuff again, and (b) when you can read outputs. Inevitably, invoke foo is not going to take care of both things at once. With this definition, invoke foo manages (a) for you but not (b). But that is no worse than the alternative, which would manage (b) for you but not (a).

@sampsyo
Copy link
Contributor

sampsyo commented Oct 6, 2023

Following up on this earlier thread between @calebmkim and @rachitnigam:

Hm, not sure I follow. If you promote something, it goes from 3 to 1 right? This is because we're saying that we definitely don't need to use the done signal

What I believe @calebmkim is saying is:

  • Inferring the attribute goes from category 1 to category 3. That is, the component's calling convention remains dynamic, but we learn its latency as a hint.
  • Actual promotion takes the @static attribute and converts a component into an actual static component, moving it from category 3 to category 2 (a clean static-only component). This is sort of awkward because it may involve making two copies of the component: a promoted and an unpromoted one, in case it is used in both contexts.

So we could actually just not do the second step, and only ever automatically move components into category 3 (never 2), which would make them behave like std_reg and have "dual citizenship."

Is this a good trade-off? Not entirely sure yet, but I wanted to confirm this is what @calebmkim is envisioning.

@calebmkim
Copy link
Contributor

Yeah, that's exactly what I was thinking.

@rachitnigam
Copy link
Contributor

Should we add a new @latency(n) attribute to indicate the latency of a particular port? I was thinking something like this could be checked in the future and be useful to ensure that the pipeline is correctly balanced.

@sampsyo
Copy link
Contributor

sampsyo commented Oct 16, 2023

It's a pretty good idea. This would end up being, like, a super restrictive embedding of a subset of Filament types… as in, a Filament component with a simple signature something like this:

comp mycomp<'G: X>(
  go: interface['G],
  inport: ['G, 'G+X] 32,
) -> (
  outport: ['G+Y, 'G+Y+1] 32,
)

Could compile to a Calyx signature like this:

static<X> component mycomp(
  inport: 32
) -> (
  @latency(Y) outport: 32
)

@rachitnigam
Copy link
Contributor

Subsumed by #1754

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Calyx Extension or change to the Calyx IL Calyx 2.0 Things that move us towards Calyx 2.0
Projects
None yet
Development

No branches or pull requests

4 participants