Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: unify the backend processing #550

Closed
wants to merge 19 commits into from

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Feb 20, 2023

Summary

Right now, each code generator / backend works differently. Both cgen and jsgen use a recursive approach, where the code for a procedure is generated when it's first used. While cgen emits the generated code into multiple C files (one corresponding to each NimSkull module), jsgen emits all of it into a single file, and also requires special handling for inner procedures (lambda lifting is disabled for the JS backend).

For the VM, code generation works significantly different: the code generator (vmgen) is, for the most part, only responsible for generating code. Invoking the code generator to generate the bytecode for all alive procedures is left to the callsite. During compile-time execution, this is the responsibility of the JIT logic (vmjit), and for the VM backend it's that of vmbackend. The latter uses, supported by vmgen, an iterative approach for discovering all alive procedures and passing them to the code generator.

The problem with the recursive approach is that it's inflexible: how discovery of alive procedures happens is an intrinsic property of it and can't be easily changed. In addition, transforming a procedure's code and applying the MIR passes to it has to happen from inside the code generators, meaning that they have to carry the necessary state around, further complicating the whole implementation.

This PR implements a facility for collecting the AST of the whole program and applying the pre-processing (e.g. transf, applying eligible MIR passes) to all code passed to it, making the resulting procedures accessible as a stream.

The approach is an evolution of how the processing is implemented in #424 (which itself evolved from vmbackend), with the difference that it's more general and that the procedures are, except for inner closure procedures, only processed one-by-one instead of all at once. This change makes the layer more flexible, allowing it to be used with both interleaved and non-interleaved compilation (interleaved here meaning: alternating between semantic analysis and code generation).

The C, JS, and VM code generators are now no longer responsible for discovering dependencies, and, in the case of cgen/jsgen, processing them. Both things are now implemented by a dedicated orchestrator (which used the aforementioned processing facilities) for each backend, similar to vmbackend.

For the first revision, the orchestrators are only concerned with procedures, but will eventually also manage code generation for constants and globals.


Notes for reviewers

  • the PR is an early proof-of-concept. The focus so far was only to make it work
  • once finalized, this PR will only include the new processing facilities -- changing each code generator to make use of them will all be separate PRs
  • removing the IC backend would make the cgen transition easier

@zerbina zerbina added refactor Implementation refactor compiler General compiler tag compiler/backend Related to backend system of the compiler labels Feb 20, 2023
It's legal for them to have one, so it's important to not modify it when
further processing is not interested in it.
This fixes a severe layering violation, namely that procedure processing
mutated module-related state. The code for initializing all procedure-level
globals is now stored with a `Procedure`, which allows its consumer to
subject it to further processing.

In addition, MIR passes (currently only destructor injection) are now also
applied to this fragment, fixing a pre-existing issue.
Access to an owner and an `IdGenerator` was not necessary.
...procedure-level globals
In the short-term, they're not something that exists in the generated
code and are only meant to solve the problem around
`globalDestructors`.
Their initialization logic is scanned for used routines, and they're
also registered with the module structs now.
Using the module-struct information, the destruction logic can be
generated without making use of `globalDestructors`, making the latter
obslote.
Calls to the procedures are injected as part of `finalCodegenActions`,
at which point the dependencies can't be registered with the procedure
stream anymore. The dependencies are now explicitly registered during
early module processing.
The collected top-level statements are now longer needed once
translated to MIR code, so the AST can already be released early on.

A different solution is required eventually, but the current one is
good enough for now.
This is a temporary workaround to globals not having a generated name
at the right time. It falls apart when there are cyclic imports.
Procedure-level globals are now handled by the orchestrator, meaning
that they're initialized as part of module pre-initialization (eventually,
at least), mirroring how it works when using the C backend.
@zerbina
Copy link
Collaborator Author

zerbina commented Feb 22, 2023

There were a few issues with the previous handling of procedure-level globals (in this PR):

  1. entities (currently only routines) used by their initialization logic were queued from inside routine processing, foregoing the callsite
  2. MIR passes were not applied to their initialization logic (a pre-existing issue)
  3. processing a single procedure has to mutate/track state unrelated to procedures (e.g. a module's pre-initialization code fragment)

The problem with 1. is that the callsite might want to know about dependencies (the C backend does) -- queuing procedures from inside a call to next meant that it couldn't, breaking inline handling for the C backend, for example.

Number 3 is a severe layering and single-responsibility violation that required next to mutate module-related state, and ProcedureIter to know about destructors for globals.

The solution

The fix is to use a much simpler and more local approach: store the extracted globals and the MIR code fragment containing their initialization logic in the Procedure object returned from next. This way, the callsite is responsible for generating the destruction logic, building up a module's pre-initialization code, and queuing dependencies.

In the future, it might make sense to perform the lifting of procedure-level globals into a special section of a routine during semantic analysis.

Clean-up for globals

Another issue was with how globals requiring destruction (via a destructor call) were/are handled (both in this PR and previously). For top-level globals, the injectdestructors pass generates a statement for calling the respective destructor (if one is available) and appends it to the globalDestructors statement list stored as part of ModuleGraph. cgen and jsgen then query said statement list when closing the main module, generate code for it, and append the result to the end of the main module's init procedure.

There are multiple problems with this approach:

  1. the injectdestructors pass has side-effects on pass-external state (i.e. it modifies the ModuleGraph)
  2. order of destruction for globals is tied to the order in which the code their defined in is transformed
  3. given that a significant number of routines in the compiler have access to the ModuleGraph instance, globalDestructors is essentially a global

The first one would become a practical problem once the injectdestructors pass is also run for code meant for compile-time execution, as destructor calls for globals only existing in a compile-time context are then added to the same statement list used for normal globals.

To get rid of globalDestructors, module structs are introduced. A module struct encapsulates all data owned by a module. Currently, these are globals (both top-level and procedure-level ones) and .threadvars. All entities that make up a module's struct are collected, and, once all alive code is processed, the resulting information used to generate the clean-up logic.

Right now, the structs only exist at a logical level, but it might make sense to also emit them as real types in the generated code in the future.

It was used by an early attempt at `inline` procedure handling, but is now
obsolete.
`hasNext` didn't check for the presence of methods.
zerbina added 2 commits March 12, 2023 22:43
The dispatchers are already generated as part of the unified procedure-
stream processing.
@zerbina
Copy link
Collaborator Author

zerbina commented May 19, 2023

The PR is starting to become too large, both in terms of code changes and impact. I'm going to split the changes here into multiple smaller, separate PRs.

@zerbina
Copy link
Collaborator Author

zerbina commented Jun 27, 2023

Superseded by #712, #714, and, most importantly, #777. Splitting this PR up turned out to be the right choice; the implementation I have now ended up with is much cleaner and simpler.

@zerbina zerbina closed this Jun 27, 2023
@zerbina zerbina deleted the unified-code-gen-processing branch September 16, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/backend Related to backend system of the compiler compiler General compiler tag refactor Implementation refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants