compiler: unify the backend processing #550

zerbina · 2023-02-20T20:42:54Z

Summary

Right now, each code generator / backend works differently. Both cgen and jsgen use a recursive approach, where the code for a procedure is generated when it's first used. While cgen emits the generated code into multiple C files (one corresponding to each NimSkull module), jsgen emits all of it into a single file, and also requires special handling for inner procedures (lambda lifting is disabled for the JS backend).

For the VM, code generation works significantly different: the code generator (vmgen) is, for the most part, only responsible for generating code. Invoking the code generator to generate the bytecode for all alive procedures is left to the callsite. During compile-time execution, this is the responsibility of the JIT logic (vmjit), and for the VM backend it's that of vmbackend. The latter uses, supported by vmgen, an iterative approach for discovering all alive procedures and passing them to the code generator.

The problem with the recursive approach is that it's inflexible: how discovery of alive procedures happens is an intrinsic property of it and can't be easily changed. In addition, transforming a procedure's code and applying the MIR passes to it has to happen from inside the code generators, meaning that they have to carry the necessary state around, further complicating the whole implementation.

This PR implements a facility for collecting the AST of the whole program and applying the pre-processing (e.g. transf, applying eligible MIR passes) to all code passed to it, making the resulting procedures accessible as a stream.

The approach is an evolution of how the processing is implemented in #424 (which itself evolved from vmbackend), with the difference that it's more general and that the procedures are, except for inner closure procedures, only processed one-by-one instead of all at once. This change makes the layer more flexible, allowing it to be used with both interleaved and non-interleaved compilation (interleaved here meaning: alternating between semantic analysis and code generation).

The C, JS, and VM code generators are now no longer responsible for discovering dependencies, and, in the case of cgen/jsgen, processing them. Both things are now implemented by a dedicated orchestrator (which used the aforementioned processing facilities) for each backend, similar to vmbackend.

For the first revision, the orchestrators are only concerned with procedures, but will eventually also manage code generation for constants and globals.

Notes for reviewers

the PR is an early proof-of-concept. The focus so far was only to make it work
once finalized, this PR will only include the new processing facilities -- changing each code generator to make use of them will all be separate PRs
removing the IC backend would make the cgen transition easier

It's legal for them to have one, so it's important to not modify it when further processing is not interested in it.

This fixes a severe layering violation, namely that procedure processing mutated module-related state. The code for initializing all procedure-level globals is now stored with a `Procedure`, which allows its consumer to subject it to further processing. In addition, MIR passes (currently only destructor injection) are now also applied to this fragment, fixing a pre-existing issue.

Access to an owner and an `IdGenerator` was not necessary.

...procedure-level globals

In the short-term, they're not something that exists in the generated code and are only meant to solve the problem around `globalDestructors`.

Their initialization logic is scanned for used routines, and they're also registered with the module structs now.

Using the module-struct information, the destruction logic can be generated without making use of `globalDestructors`, making the latter obslote.

Calls to the procedures are injected as part of `finalCodegenActions`, at which point the dependencies can't be registered with the procedure stream anymore. The dependencies are now explicitly registered during early module processing.

The collected top-level statements are now longer needed once translated to MIR code, so the AST can already be released early on. A different solution is required eventually, but the current one is good enough for now.

This is a temporary workaround to globals not having a generated name at the right time. It falls apart when there are cyclic imports.

Procedure-level globals are now handled by the orchestrator, meaning that they're initialized as part of module pre-initialization (eventually, at least), mirroring how it works when using the C backend.

zerbina · 2023-02-22T18:54:12Z

There were a few issues with the previous handling of procedure-level globals (in this PR):

entities (currently only routines) used by their initialization logic were queued from inside routine processing, foregoing the callsite
MIR passes were not applied to their initialization logic (a pre-existing issue)
processing a single procedure has to mutate/track state unrelated to procedures (e.g. a module's pre-initialization code fragment)

The problem with 1. is that the callsite might want to know about dependencies (the C backend does) -- queuing procedures from inside a call to next meant that it couldn't, breaking inline handling for the C backend, for example.

Number 3 is a severe layering and single-responsibility violation that required next to mutate module-related state, and ProcedureIter to know about destructors for globals.

The solution

The fix is to use a much simpler and more local approach: store the extracted globals and the MIR code fragment containing their initialization logic in the Procedure object returned from next. This way, the callsite is responsible for generating the destruction logic, building up a module's pre-initialization code, and queuing dependencies.

In the future, it might make sense to perform the lifting of procedure-level globals into a special section of a routine during semantic analysis.

Clean-up for globals

Another issue was with how globals requiring destruction (via a destructor call) were/are handled (both in this PR and previously). For top-level globals, the injectdestructors pass generates a statement for calling the respective destructor (if one is available) and appends it to the globalDestructors statement list stored as part of ModuleGraph. cgen and jsgen then query said statement list when closing the main module, generate code for it, and append the result to the end of the main module's init procedure.

There are multiple problems with this approach:

the injectdestructors pass has side-effects on pass-external state (i.e. it modifies the ModuleGraph)
order of destruction for globals is tied to the order in which the code their defined in is transformed
given that a significant number of routines in the compiler have access to the ModuleGraph instance, globalDestructors is essentially a global

The first one would become a practical problem once the injectdestructors pass is also run for code meant for compile-time execution, as destructor calls for globals only existing in a compile-time context are then added to the same statement list used for normal globals.

To get rid of globalDestructors, module structs are introduced. A module struct encapsulates all data owned by a module. Currently, these are globals (both top-level and procedure-level ones) and .threadvars. All entities that make up a module's struct are collected, and, once all alive code is processed, the resulting information used to generate the clean-up logic.

Right now, the structs only exist at a logical level, but it might make sense to also emit them as real types in the generated code in the future.

It was used by an early attempt at `inline` procedure handling, but is now obsolete.

`hasNext` didn't check for the presence of methods.

The dispatchers are already generated as part of the unified procedure- stream processing.

zerbina · 2023-05-19T17:43:38Z

The PR is starting to become too large, both in terms of code changes and impact. I'm going to split the changes here into multiple smaller, separate PRs.

zerbina · 2023-06-27T21:36:35Z

Superseded by #712, #714, and, most importantly, #777. Splitting this PR up turned out to be the right choice; the implementation I have now ended up with is much cleaner and simpler.

WIP: unify the backend processing

110522e

zerbina added refactor Implementation refactor compiler General compiler tag compiler/backend Related to backend system of the compiler labels Feb 20, 2023

zerbina added 13 commits February 22, 2023 17:58

backends: don't preprocess the body of imported procedures

1f8f9a7

It's legal for them to have one, so it's important to not modify it when further processing is not interested in it.

injectdestructors: simplify genDestroy

8a9b255

Access to an owner and an `IdGenerator` was not necessary.

backends: also collect the symbols of...

a075f2d

...procedure-level globals

backend2: introduce module structs

a107959

In the short-term, they're not something that exists in the generated code and are only meant to solve the problem around `globalDestructors`.

backend2: handle procedure-level globals

21bd16e

Their initialization logic is scanned for used routines, and they're also registered with the module structs now.

backend2: implement clean-up for module structs

d9b835e

Using the module-struct information, the destruction logic can be generated without making use of `globalDestructors`, making the latter obslote.

backend2: slightly reduce memory usage

71b7281

The collected top-level statements are now longer needed once translated to MIR code, so the AST can already be released early on. A different solution is required eventually, but the current one is good enough for now.

jsbackend: split-off module processing into a procedure

32d2887

jsbackend: process modules in "closed" order

0895ee3

This is a temporary workaround to globals not having a generated name at the right time. It falls apart when there are cyclic imports.

jsbackend: make .globals work the same as they do with C

e67b083

Procedure-level globals are now handled by the orchestrator, meaning that they're initialized as part of module pre-initialization (eventually, at least), mirroring how it works when using the C backend.

jsbackend: implement clean-up for globals

4e04c58

zerbina added 3 commits February 22, 2023 20:28

cgendata: remove usedTypes

2d8f7c0

It was used by an early attempt at `inline` procedure handling, but is now obsolete.

cgen: fix linker error when using arc/orc

02bcd2a

backend: fix methods not being code-gen'ed

e2304ed

`hasNext` didn't check for the presence of methods.

haxscramper modified the milestones: C backend refactoring, MIR phase Feb 25, 2023

zerbina added 2 commits March 12, 2023 22:43

cgen: disable method dispatcher generation

fa13f17

The dispatchers are already generated as part of the unified procedure- stream processing.

remove unused imports

0fce750

zerbina mentioned this pull request Mar 14, 2023

enable lambda-lifting for the JavaScript target #586

Merged

zerbina mentioned this pull request Mar 23, 2023

vmgen: separate dependency collection #600

Merged

zerbina mentioned this pull request May 19, 2023

use whole-program code-generation for all backends #712

Merged

zerbina mentioned this pull request May 21, 2023

backend: lower modules into procedures #714

Merged

This was referenced Jun 23, 2023

disallow using .compileTime locations at run-time #773

Merged

support methods with the VM backend #775

Merged

compiler: unify the pre-code-gen processing #777

Merged

zerbina closed this Jun 27, 2023

zerbina deleted the unified-code-gen-processing branch September 16, 2023 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: unify the backend processing #550

compiler: unify the backend processing #550

zerbina commented Feb 20, 2023

zerbina commented Feb 22, 2023

zerbina commented May 19, 2023

zerbina commented Jun 27, 2023

compiler: unify the backend processing #550

compiler: unify the backend processing #550

Conversation

zerbina commented Feb 20, 2023

Summary

Notes for reviewers

zerbina commented Feb 22, 2023

The solution

Clean-up for globals

zerbina commented May 19, 2023

zerbina commented Jun 27, 2023