-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo internal cell memory layout #4461
base: main
Are you sure you want to change the base?
Conversation
This was meant to be a draft PR and I hit the wrong button writing the description. Heck |
What would be the fundamental issues provided we switch to the buffer-normalized mode of #3967 (which this PR should be compatible with)?
I pushed for @widlarizer to open a draft PR no matter what state this is in. Benchmarks can come later. |
FWIW pmgen-generated C++ code is not checked into the repository. It's treated as a build artifact. |
That's an improvement, although other issues remain (like the unnecessary coarse/fine cell split, or the combined treatment of logic and inout wires in the same netlist). |
Benchmarks are coming. The work presented here right now is exclusively an incomplete proof of concept. Incomplete as in it immediately segfaults on anything in the current state. I want to find a case where some cells being fast tracked really shows improvements. If we find better fundamental approach, all this is free to get canned, in such a case it still will have been a helpful exercise in my eyes |
Can you elaborate on the adverse effects of this? From a performance point of view it doesn't seem too concerning, but I may be missing something.
I wanted to redesign the netlist first, but that was before I learned about bufnorm, after which it was suggested to me by povik that redoing how connections are handled would require excessive code changes and that bufnorm + smaller internal cells takes priority as a result. |
I don't think this affects performance but in general I don't care a whole lot about Yosys performance; as you may know I'm quite happy using the YoWASP version despite a ~2x perf hit, so clearly that much performance loss isn't an issue to me. But the IR is poorly defined and difficult to work with (I think the single most obnoxious issue from my perspective is the batshit behavior of |
I don't follow. How is |
|
I would be remiss if I didn't bring up that the road to performance I'm talking about is through major runtime memory size reduction which should matter for use cases of EDA in browsers.
As I understand, there are many open questions on design decisions in Yosys, as viewed from the perspective of correctness and developer experience and maintainability, but I'm not aware of any interaction this PR would have on those topics, other than this being an opportunity to have correct-by-construction, rather than correct-by-cell-checker cells. |
Huh, so e.g. the state writeback in
We should mention a bit about the background for the PR. It's motivated by company interests which allow us to invest significant effort into performance improvements. If we can fix some outstanding ills of the IR along the way, that's all the better, but it's not the case that we can pick any IR work instead. I guess we shouldn't be doubling down on a wrong design decision by writing specialized memory layouts around it, but that doesn't seem to be happening, at least not with the issues brought up so far. |
Interestingly, YoWASP already greatly benefits from such size reduction compared to native builds because it uses an ABI with 32-bit pointers. This is for example why nextpnr is ~on par with a native build, despite other inefficiencies. I do welcome performance improvements (who doesn't?) but I feel that there are many other things to address in Yosys that I'm a lot more excited about, since the IR structure is a major pain point and performance rarely is, at least from where I stand.
I'm uncertain on whether this design would further entrench the existing IR flaws or not. I think ultimately I'll choose to withhold any judgement I would have before I see benchmarks. That seems only right if this was intended to be a draft (which I didn't realize at first); I don't want to be negative on something that is clearly not the final work.
Yes. The problem with |
So on second thought and having read this, I think I'll just leave this PR to be driven by the company interests. Looks like it pinged me because of the CXXRTL code owner bit, I'll pre-emptively sign off on any changes there since they're quite minimal. |
This PR was initiated by my personal interest which I found to be well aligned with company interests of making specifically opt_expr faster |
Major road block identified: Yosys until now has been built with the assumption that at will one can modify cell types and erase and add params and ports. At coverage tag "opt.opt_expr.eqneq.isnot" that's exactly what's done. With my changes, it is possible to attempt it but it's union access UB. As a solution, the type field could become private, which will create large code change where read-only type access is replaced with a const getter (from |
Brief
What are the reasons/motivation for this change?
small is fast!
Explain how this is achieved.
make cell small.
If applicable, please suggest to reviewers how they can test the change.
right now, nothing works
Full
Core idea
Yosys is conceptually flexible in what a cell can be. This makes it easy to do odd custom things that match use cases. The way this is achieved is inefficient for commonly represented types. Each cell has a dict from interned parameter name strings to parameter value constants. It also has a dict from port name strings to signals represented by
SigSpec
s. However, most cells that Yosys uses in common flows have a fixed set of ports and parameters. So instead of dynamically growing space for a dict and hashing to access the value stored in it, we can have preallocated space and a compile-time known mapping if we just use a struct specific to a given cell type:We can then use a tagged union to construct a new, slimmer, faster to access cell type, with a fallback pointer to the old cell structure.
Wait, wasn't this supposed to be a Yosys rewrite in Rust?
This isn't next generation Yosys (project nyan). Rust or Zig would be better to do what I'm doing, but Yosys is a C++ codebase, so C++ it is. This is a surgical, gradual transition to more efficient internals with minimal impact on stuff outside
kernel/rtlil.h
andkernel/rtlil.cc
. The Yosys codebase (~200k LOC) and user plugins have a large number of code likecell->getParam(ID::A_WIDTH)
, we have to reproduce these interfaces rather than change them all tocell->not_.a_width
. There's a good number of such functions. The annoying part is mocking for example theparameters
dictionary. Instead, there'sstruct FakeParams
which provides iterators to hide this. Same goes for connections. Implementations ofgetParam
etc are simple switches over the cell type so they should inline well.However, there's a lot of "internal" cells we may want to be fast, and their struct layouts and methods are basically duplicating the same information. Manually writing these methods is also very error prone, accessing tagged unions incorrectly is common UB. This is why, to add support for the various cell types, the next stage will get a bit weird:
Writing a compiler
Modern C++ offers multiple ways to have code that thinks about types. However, what we need here is to construct functions and construct their return types based on struct layouts. This level of reflection if possible seems too difficult to achieve with templates and constexpr and such. We don't have Rust's proc_macro, we don't have Zig's comptime, so one possible solution is a light Python utility which from information-dense Python or plain text structures emits generates C++ which is committed into the repository and checked with CI. Yosys already depends on Python in a similar way, see
passes/pmgen/pmgen.py
.Beyond the cell
It's possible that the changes outlined above alone won't make a huge difference.
RTLIL::Const
is fairly inefficient as well. Size is virtuous in a positive feedback loop: the fewer bytes per cell, the more it helps to save a single byte per cell.Size goals
The current flexible
RTLIL::Cell
is 192 bytes. The proposed one is a bit bigger while it supports only unary elements, but doesn't require external allocations. With the current flexible cell, this simple verilog eats 6kB memory per inverter-and-FF-bit:This suggests that focused effort can bring it down by a significant factor.