Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-priority logic for guarded assignments #1423

Open
rachitnigam opened this issue Apr 28, 2023 · 1 comment
Open

Non-priority logic for guarded assignments #1423

rachitnigam opened this issue Apr 28, 2023 · 1 comment
Labels
C: Calyx Extension or change to the Calyx IL S: Available Can be worked upon

Comments

@rachitnigam
Copy link
Contributor

rachitnigam commented Apr 28, 2023

The current Verilog backend for Calyx generates priority logic which is known to inefficient. In a nutshell, because we generate code that looks like:

x = c1 ? a :
    c2 ? b :
    c3 ? c : d;

The generated hardware ends up being complex because it must first check whether c1 is false before moving onto the c2 branch (hence called "priority logic" because it gives priority to initial conditions).

Of course, the correctness condition of Calyx programs requires that only one of c1, c2, or c3 is ever true. This means that we should be able to avoid generating priority logic. One way to do this is generate code that looks like this:

x = (c1 & a) | (c2 & b) | (c3 & c) | (!c1 & !c2 & !c3 & d);

The idea here is that each guard is responsible for "zeroing-out" its corresponding guarded value. Note that the default case unfortunately still generates an extra long guard. However, we know (not from defining a precise semantics but from being the implementers of Calyx), that d is always going to be 0 which means we can instead generate code that looks like:

x = (c1 & a) | (c2 & b) | (c3 & c) | 0;

Furthermore, relying a bit more on the semantics, if we know something is a data path component (#1169), we can instead generate code that looks like:

x = (c1 & a) | (c2 & b) | (c3 & c);

This is another small change that we've put off for some time but could be pretty impactful. However, to measure its true utility, we have to have a CI for resource usage (#1416).

@rachitnigam rachitnigam added C: Calyx Extension or change to the Calyx IL S: Available Can be worked upon labels Apr 28, 2023
@rachitnigam rachitnigam added this to the Quality of Results milestone Apr 28, 2023
@rachitnigam rachitnigam mentioned this issue Apr 28, 2023
8 tasks
@sampsyo
Copy link
Contributor

sampsyo commented Apr 28, 2023

Thanks for getting this going in a proper thread; it has indeed been a long time coming.

I very much agree with your suggestion about measuring resource usage… I think we should be pretty careful about the potential for synthesis tools to outsmart us here.

That is, the job synthesis tools are supposed to do is to take a given function on bits (or bit vectors) and find the cheapest circuit that has that implements that truth table. So in that sense, a synthesis tool should be able to "collapse" different equivalent Boolean expressions into the same circuit. Of course, the degree to which they can actually do that is unclear; surely they do better with small expressions and simple identities and worse with large expressions and non-obvious transformations.

Within this spectrum, one thing I would expect them to do pretty good is to catch that expr | 0 is equivalent to expr. (Since it's a syntactic identity, it doesn't require exhaustively expanding the truth table, etc.)

It is much more dubious that synthesis could somehow generate the same circuit for these two expressions, as you indicated:

c1 ? a : c2 ? b : c3 ? c : d
(c1 & a) | (c2 & b) | (c3 & c) | (!c1 & !c2 & !c3 & d)

…because it's unlikely that the tool could somehow deduce that c1, c2, and c3 are mutually exclusive (and this property is required for the expressions to be equivalent).

But even so, because synthesis tools move in mysterious ways, it would be great to measure this empirically. Perhaps a good first step here would be to generate two completely synthetic Verilog files (not generated from Calyx programs or anything; just stress-tests for the two styles) with big expressions like this, run them through Vivado or whatever, and compare the delay/area. That could help us settle on which versions of this "cascade" are worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Calyx Extension or change to the Calyx IL S: Available Can be worked upon
Projects
None yet
Development

No branches or pull requests

2 participants