_convert_fragment() has polynomial time complexity as signals are added to design #711

antonblanchard · 2022-07-18T00:49:15Z

While creating a large gate level multiplier in amaranth I noticed that it took a long time to write out the Verilog.

A simple test case:

#!/usr/bin/python3

import sys

from amaranth import Elaboratable, Module, Signal
from amaranth.back import verilog

class Test(Elaboratable):
    def __init__(self, nr_sigs=1000):
        self.i = Signal()
        self.o = Signal()
        self.nr_sigs=nr_sigs

    def elaborate(self, platform):
        m = Module()

        s = Signal()
        m.d.comb += s.eq(self.i)

        for i in range(self.nr_sigs):
            s_new = Signal(name="s_%i" % i)
            m.d.comb += s_new.eq(s)
            s = s_new

        m.d.comb += self.o.eq(s)

        return m

nr_sigs=int(sys.argv[1])
test = Test(nr_sigs=nr_sigs)
print(verilog.convert(test, ports=[test.i, test.o], strip_internal_attrs=True))

Things start going bad at 1000 signals.

Almost all the time is in the code that groups statements - for each signal we then iterate over each statement. I don't understand everything we are doing here (eg why we have to group signals vs just putting each one in their own process), but it seems we could make a single pass through the signals and statements to form them into the groups we need.

        lhs_grouper = xfrm.LHSGroupAnalyzer()
        lhs_grouper.on_statements(fragment.statements)

        for group, group_signals in lhs_grouper.groups().items():
            lhs_group_filter = xfrm.LHSGroupFilter(group_signals)
            group_stmts = lhs_group_filter(fragment.statements)

            with module.process(name="$group_{}".format(group)) as process:
                with process.case() as case:
                    # For every signal in comb domain, assign \sig$next to the reset value.
                    # For every signal in sync domains, assign \sig$next to the current
                    # value (\sig).
                    for domain, signal in fragment.iter_drivers():
                        if signal not in group_signals:
                            continue
                        if domain is None:
                            prev_value = ast.Const(signal.reset, signal.width)
                        else:
                            prev_value = signal
                        case.assign(lhs_compiler(signal), rhs_compiler(prev_value))

                    # Convert statements into decision trees.
                    stmt_compiler._case = case
                    stmt_compiler._has_rhs = False
                    stmt_compiler._wrap_assign = False
                    stmt_compiler(group_stmts)

The text was updated successfully, but these errors were encountered:

whitequark · 2022-07-18T04:11:52Z

I'll take a look, thanks!

whitequark · 2022-09-24T10:19:25Z

Could you prepare a PR with this change, please?

whitequark · 2023-02-16T21:01:33Z

Could you prepare a PR with this change, please?

Sorry, I misunderstood. For some reason I thought you posted a patch in the PR, not just a reference to the code.

whitequark · 2023-02-16T21:21:39Z

eg why we have to group signals vs just putting each one in their own process

We have to group signals because the target of an assignment can be a concatenation. Consider something like:

a = Signal(16)
b = Signal(16)
c = Signal(16)
d = Signal(16)
m.d.comb += Cat(a, b).eq(c * d)

Unless you stuff the two of them into the same process, you have to duplicate the right-hand side, which can get really unwieldy for certain kinds of code. (We already duplicate

The reason we can't just stuff everything into the same process is that you can have something like:

m.d.comb += a.eq(b)
m.d.comb += b.eq(c)

which confuses Verilog simulators if you translate it to:

always @(*) begin
  a = b;
  b = c;
end

I think it should work fine if you do it like:

always @(*) begin
  a$next = b;
  b$next = c;
end
always @(*) begin
  a = a$next;
  b = b$next;
end

but that's ugly.

The right approach here is to convert RTL to netlist on Amaranth side and then convert netlist back to readable-ish Verilog (this is feasible), but it'll take a bit of time to do that.

whitequark · 2024-02-13T07:42:29Z

This is fixed by the new IR in the main branch.

whitequark added improvement backend:rtlil labels Jul 18, 2022

whitequark added this to the 0.4 milestone Jan 31, 2023

whitequark removed this from the 0.4 milestone Feb 16, 2023

whitequark added this to the 0.5 milestone Feb 13, 2024

whitequark closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_convert_fragment() has polynomial time complexity as signals are added to design #711

_convert_fragment() has polynomial time complexity as signals are added to design #711

antonblanchard commented Jul 18, 2022

whitequark commented Jul 18, 2022

whitequark commented Sep 24, 2022

whitequark commented Feb 16, 2023

whitequark commented Feb 16, 2023

whitequark commented Feb 13, 2024

_convert_fragment() has polynomial time complexity as signals are added to design #711

_convert_fragment() has polynomial time complexity as signals are added to design #711

Comments

antonblanchard commented Jul 18, 2022

whitequark commented Jul 18, 2022

whitequark commented Sep 24, 2022

whitequark commented Feb 16, 2023

whitequark commented Feb 16, 2023

whitequark commented Feb 13, 2024