-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: unify the backend processing #550
Conversation
It's legal for them to have one, so it's important to not modify it when further processing is not interested in it.
This fixes a severe layering violation, namely that procedure processing mutated module-related state. The code for initializing all procedure-level globals is now stored with a `Procedure`, which allows its consumer to subject it to further processing. In addition, MIR passes (currently only destructor injection) are now also applied to this fragment, fixing a pre-existing issue.
Access to an owner and an `IdGenerator` was not necessary.
...procedure-level globals
In the short-term, they're not something that exists in the generated code and are only meant to solve the problem around `globalDestructors`.
Their initialization logic is scanned for used routines, and they're also registered with the module structs now.
Using the module-struct information, the destruction logic can be generated without making use of `globalDestructors`, making the latter obslote.
Calls to the procedures are injected as part of `finalCodegenActions`, at which point the dependencies can't be registered with the procedure stream anymore. The dependencies are now explicitly registered during early module processing.
The collected top-level statements are now longer needed once translated to MIR code, so the AST can already be released early on. A different solution is required eventually, but the current one is good enough for now.
This is a temporary workaround to globals not having a generated name at the right time. It falls apart when there are cyclic imports.
Procedure-level globals are now handled by the orchestrator, meaning that they're initialized as part of module pre-initialization (eventually, at least), mirroring how it works when using the C backend.
There were a few issues with the previous handling of procedure-level globals (in this PR):
The problem with 1. is that the callsite might want to know about dependencies (the C backend does) -- queuing procedures from inside a call to Number 3 is a severe layering and single-responsibility violation that required The solutionThe fix is to use a much simpler and more local approach: store the extracted globals and the MIR code fragment containing their initialization logic in the In the future, it might make sense to perform the lifting of procedure-level globals into a special section of a routine during semantic analysis. Clean-up for globalsAnother issue was with how globals requiring destruction (via a destructor call) were/are handled (both in this PR and previously). For top-level globals, the There are multiple problems with this approach:
The first one would become a practical problem once the To get rid of Right now, the structs only exist at a logical level, but it might make sense to also emit them as real types in the generated code in the future. |
It was used by an early attempt at `inline` procedure handling, but is now obsolete.
`hasNext` didn't check for the presence of methods.
The dispatchers are already generated as part of the unified procedure- stream processing.
The PR is starting to become too large, both in terms of code changes and impact. I'm going to split the changes here into multiple smaller, separate PRs. |
Summary
Right now, each code generator / backend works differently. Both
cgen
andjsgen
use a recursive approach, where the code for a procedure is generated when it's first used. Whilecgen
emits the generated code into multiple C files (one corresponding to each NimSkull module),jsgen
emits all of it into a single file, and also requires special handling for inner procedures (lambda lifting is disabled for the JS backend).For the VM, code generation works significantly different: the code generator (
vmgen
) is, for the most part, only responsible for generating code. Invoking the code generator to generate the bytecode for all alive procedures is left to the callsite. During compile-time execution, this is the responsibility of the JIT logic (vmjit
), and for the VM backend it's that ofvmbackend
. The latter uses, supported byvmgen
, an iterative approach for discovering all alive procedures and passing them to the code generator.The problem with the recursive approach is that it's inflexible: how discovery of alive procedures happens is an intrinsic property of it and can't be easily changed. In addition, transforming a procedure's code and applying the MIR passes to it has to happen from inside the code generators, meaning that they have to carry the necessary state around, further complicating the whole implementation.
This PR implements a facility for collecting the AST of the whole program and applying the pre-processing (e.g.
transf
, applying eligible MIR passes) to all code passed to it, making the resulting procedures accessible as a stream.The approach is an evolution of how the processing is implemented in #424 (which itself evolved from
vmbackend
), with the difference that it's more general and that the procedures are, except for innerclosure
procedures, only processed one-by-one instead of all at once. This change makes the layer more flexible, allowing it to be used with both interleaved and non-interleaved compilation (interleaved here meaning: alternating between semantic analysis and code generation).The C, JS, and VM code generators are now no longer responsible for discovering dependencies, and, in the case of
cgen
/jsgen
, processing them. Both things are now implemented by a dedicated orchestrator (which used the aforementioned processing facilities) for each backend, similar tovmbackend
.For the first revision, the orchestrators are only concerned with procedures, but will eventually also manage code generation for constants and globals.
Notes for reviewers
cgen
transition easier