You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current Verilog backend for Calyx generates priority logic which is known to inefficient. In a nutshell, because we generate code that looks like:
x = c1 ? a :
c2 ? b :
c3 ? c : d;
The generated hardware ends up being complex because it must first check whether c1 is false before moving onto the c2 branch (hence called "priority logic" because it gives priority to initial conditions).
Of course, the correctness condition of Calyx programs requires that only one of c1, c2, or c3 is ever true. This means that we should be able to avoid generating priority logic. One way to do this is generate code that looks like this:
x = (c1 & a) | (c2 & b) | (c3 & c) | (!c1 & !c2 & !c3 & d);
The idea here is that each guard is responsible for "zeroing-out" its corresponding guarded value. Note that the default case unfortunately still generates an extra long guard. However, we know (not from defining a precise semantics but from being the implementers of Calyx), that d is always going to be 0 which means we can instead generate code that looks like:
x = (c1 & a) | (c2 & b) | (c3 & c) | 0;
Furthermore, relying a bit more on the semantics, if we know something is a data path component (#1169), we can instead generate code that looks like:
x = (c1 & a) | (c2 & b) | (c3 & c);
This is another small change that we've put off for some time but could be pretty impactful. However, to measure its true utility, we have to have a CI for resource usage (#1416).
The text was updated successfully, but these errors were encountered:
Thanks for getting this going in a proper thread; it has indeed been a long time coming.
I very much agree with your suggestion about measuring resource usage… I think we should be pretty careful about the potential for synthesis tools to outsmart us here.
That is, the job synthesis tools are supposed to do is to take a given function on bits (or bit vectors) and find the cheapest circuit that has that implements that truth table. So in that sense, a synthesis tool should be able to "collapse" different equivalent Boolean expressions into the same circuit. Of course, the degree to which they can actually do that is unclear; surely they do better with small expressions and simple identities and worse with large expressions and non-obvious transformations.
Within this spectrum, one thing I would expect them to do pretty good is to catch that expr | 0 is equivalent to expr. (Since it's a syntactic identity, it doesn't require exhaustively expanding the truth table, etc.)
It is much more dubious that synthesis could somehow generate the same circuit for these two expressions, as you indicated:
c1 ? a : c2 ? b : c3 ? c : d
(c1 & a) | (c2 & b) | (c3 & c) | (!c1 & !c2 & !c3 & d)
…because it's unlikely that the tool could somehow deduce that c1, c2, and c3 are mutually exclusive (and this property is required for the expressions to be equivalent).
But even so, because synthesis tools move in mysterious ways, it would be great to measure this empirically. Perhaps a good first step here would be to generate two completely synthetic Verilog files (not generated from Calyx programs or anything; just stress-tests for the two styles) with big expressions like this, run them through Vivado or whatever, and compare the delay/area. That could help us settle on which versions of this "cascade" are worth it.
The current Verilog backend for Calyx generates priority logic which is known to inefficient. In a nutshell, because we generate code that looks like:
The generated hardware ends up being complex because it must first check whether
c1
is false before moving onto thec2
branch (hence called "priority logic" because it gives priority to initial conditions).Of course, the correctness condition of Calyx programs requires that only one of
c1
,c2
, orc3
is ever true. This means that we should be able to avoid generating priority logic. One way to do this is generate code that looks like this:The idea here is that each guard is responsible for "zeroing-out" its corresponding guarded value. Note that the default case unfortunately still generates an extra long guard. However, we know (not from defining a precise semantics but from being the implementers of Calyx), that
d
is always going to be0
which means we can instead generate code that looks like:Furthermore, relying a bit more on the semantics, if we know something is a data path component (#1169), we can instead generate code that looks like:
This is another small change that we've put off for some time but could be pretty impactful. However, to measure its true utility, we have to have a CI for resource usage (#1416).
The text was updated successfully, but these errors were encountered: