compiler: make injectdestructors a MIR pass

## Summary The previous `injectdestructors` pass operated on `PNode` AST, and while it should have been able to work with the AST as generated by `astgen`, it didn't, which required the canonicalization step to be disabled when the `injecdestructors` pass was needed. Since the goal is to have all code-generators operate on either the MIR or another IR derived from the former, it is necessary that the MIR -> AST step happens immediately before code-generation, with no further transformation in-between. The pass is rewritten from the ground up, fixing many issues / bugs the previous implementation had and also making the the pass significantly more time and memory efficient than it was previously. In addition, the pass generates more efficient code due to the smarter omission of `wasMoved` and `=sink` calls. An additional change is that calling a procedure that only raises `Defect`s is now also considered as having control-flow related side- effects. They previously weren't, which meant that locations sometimes weren't cleaned up (via invoking the destructor) when raising a `Defect`. ## Fixes Assignments to location with destructors where the right-hand side is a complex expression now respect the strict left-to-right evaluation order: ```nim var i = 0 s = "str" # works now: (i = 1; s) = if i == 1: "str2" else: doAssert false; "" # also works now: proc p(s: var string): var string = i = 2 s p(s) = block: doAssert i == 2 "str2" ``` A mutation of the argument is now considered to happen when control-flow reaches the call, not when the argument is passed to a `var` parameter. The difference is subtle but important: ```nim proc f_sink(x: var seq[int], y: sink seq[int]) = y[0] = 2 doAssert x == [1, 2] # passes proc main() = var s = @[1, 2] f_sink(s, s) # a full copy is now introduced for the second argument main() ``` Globals are now only moved if it can proven that doing so does not affect program semantics. Previously, assigning a location derived from a global would perform a destructive move in certain circumstances, such as: ```nim # at module scope var a = "str" var b = a # moved proc p() = discard echo a # echoes an empty string ``` Due to the way the new implementation works, it is now possible for locations derived from globals to be consumed (if safe) when passing them to `sink` parameters. As a side-effect of operating on the MIR, the alias analysis now handles tuple access correctly: ```nim proc prc(x: var string) = var tup = (x: 0, y: "str") # `tup[1]` and `tup.y` were treated as two disjoint locations, # causing the assignment below to be erroneously optimized into a # move x = tup[1] doAssert tup.y == "str" var o: string prc(o) ``` ## Details In contrast to the previous implementation, the new one splits the pass up into multiple sub-passes. This makes the code easier to reason about, more modular, and reduces the amount of state each sub-pass has access to. The core part, the lifetime-tracking hook injection, is split into a *setup*, *analysis*, and *application* step (executed in that order). During the *setup* step, a control-flow graph of the code is computed and a database about the entities (i.e. locations) relevant to the pass built. Using both of them, the *analysis* step (which is itself split up into multiple sub-steps) computes for all values appearing in a *consume* context (e.g. argument to a `sink` parameter, right-hand-side of an assignment) whether they're owned and what locations still store a value at the end of their scope (and thus needing destruction). Finally, the *application* step uses all previously gather information to rewrite assignments into copy or sink hook calls, inject copies for unowned values passed to `sink` parameters, and to, where necessary, inject calls to the destructors. For both simplicity and efficiency, the assignment rewriting and copy injection both perform additional data-flow analysis in order to omit unnecessary calls to `wasMoved` and to turn sink assignments into shallow assignments (i.e. `FastAsgn`). ### Data-flow analysis Compared to the data-flow analysis implementation (`dfa.nim`) used previously, definitions (`def`) and uses (`use`) are not encoded in the control-flow graph but rather stored separately. The new forward analysis also doesn't have quadratic time and memory complexity in the case of loops. How data-flow analysis is employed is also different: instead of computing the last-reads and first-writes of *all* locations during a single traversal of the full control-flow graph, they're now computed *on-demand*. For example, to figure whether the value coming from an lvalue expression is owned (in a consume context) forward data-flow analysis is perfomed to check whether a further read is reachable. ### Additional changes - fix `mirchangesets` not compiling - fix multiple of the sets defined in `mirtrees` having the wrong elements - fix the `indexLike` tag routine for `NodeInstance` not being exported from `mirtrees` - fix `parentEnd` yielding the wrong node in some cases - add a procedure for querying the enum of a types equality operator magic without the symbol - add the `OrdinalSeq` type to the `utils/containers.nim` module - add the `Cursor` type; meant as a temporary placeholder for immutable views - add multiple new destructor-related test cases - remove the `nfLastRead` and `nfFirstWrite` node flags - remove the `sfSingleUsedTemp` symbol flag - remove the `rsemUncollectableRefCycle` warning. It had questionable utility and removing it was simpler than to make it work again. Should there exist the desire to bring it back, the problematic assignment should be detected in `sempass2` - remove the `optimizer` module - flag the `arc/tstrformat.nim` test as a known issue ### Timings Timings and peakmem are taken from compiling the compiler itself with the options `-d:release --compileOnly` and no cached compilation artifacts. The original compiler is build with `-d:release --exceptions:goto`. |compiler from|bootstrapped with|using|time taken|peakmem| |:------------|:----------------|:----|:---------|:------| |#d7a70238d0|`--gc:refc`|`--gc:refc`|24.426s|1015.844MiB| |this commit|`--gc:refc`|`--gc:refc`|24.687s|1014.254MiB| |#d7a70238d0 |`--gc:refc`|`--gc:orc`|45.206s|1.99GiB| |this commit |`--gc:refc`|`--gc:orc`|29.531s|1.222GiB| |#d7a70238d0 |`--gc:orc`|`--gc:refc`|15.492s|978.938MiB| |this commit|`--gc:orc`|`--gc:refc`|15.714s|979.023MiB| |#d7a70238d0 |`--gc:orc`|`--gc:orc`|22.218s|1.933GiB| |this commit|`--gc:orc`|`--gc:orc`|18.266s|1.188GiB|
nim-works · Feb 17, 2023 · da47c7b · da47c7b
1 parent d7a7023
commit da47c7b
Show file tree

Hide file tree

Showing 32 changed files with 4,242 additions and 1,638 deletions.
diff --git a/compiler/ast/ast_query.nim b/compiler/ast/ast_query.nim
@@ -67,8 +67,7 @@ const
   PersistentNodeFlags*: TNodeFlags = {nfBase2, nfBase8, nfBase16,
                                       nfDotSetter, nfDotField,
                                       nfIsRef, nfIsPtr, nfPreventCg, nfLL,
-                                      nfFromTemplate, nfDefaultRefsParam,
-                                      nfLastRead, nfFirstWrite}
+                                      nfFromTemplate, nfDefaultRefsParam}
 
   namePos*          = 0 ## Name of the type/proc-like node
   patternPos*       = 1 ## empty except for term rewriting macros

diff --git a/compiler/ast/ast_types.nim b/compiler/ast/ast_types.nim
@@ -306,7 +306,6 @@ type
     sfNeverRaises     ## proc can never raise an exception, not even
                       ## OverflowDefect or out-of-memory
     sfUsedInFinallyOrExcept  ## symbol is used inside an 'except' or 'finally'
-    sfSingleUsedTemp  ## For temporaries that we know will only be used once
     sfNoalias         ## 'noalias' annotation, means C's 'restrict'
     sfEffectsDelayed  ## an 'effectsDelayed' parameter
 
@@ -500,8 +499,6 @@ type
     nfDefaultParam ## an automatically inserter default parameter
     nfDefaultRefsParam ## a default param value references another parameter
                        ## the flag is applied to proc default values and to calls
-    nfLastRead  ## this node is a last read
-    nfFirstWrite## this node is a first write
     nfHasComment ## node has a comment
     nfImplicitPragma ## node is a "singlePragma" this is a transition flag
                   ## created as part of nkError refactoring for the pragmas

diff --git a/compiler/ast/report_enums.nim b/compiler/ast/report_enums.nim
@@ -826,7 +826,6 @@ type
     rsemUnsafeSetLen           = "UnsafeSetLen"
     rsemUnsafeDefault          = "UnsafeDefault"
     rsemBindDeprecated
-    rsemUncollectableRefCycle  = "CycleCreated"
     rsemObservableStores       = "ObservableStores"
     rsemCaseTransition         = "CaseTransition"
     rsemUseOfGc                = "GcMem" # last !

diff --git a/compiler/ast/reports_sem.nim b/compiler/ast/reports_sem.nim
@@ -67,9 +67,6 @@ type
       of rsemXCannotRaiseY:
         raisesList*: PNode
 
-      of rsemUncollectableRefCycle:
-        cycleField*: PNode
-
       of rsemStrictNotNilExpr, rsemStrictNotNilResult:
         nilIssue*: Nilability
         nilHistory*: seq[SemNilHistory]

diff --git a/compiler/front/cli_reporter.nim b/compiler/front/cli_reporter.nim
@@ -1909,13 +1909,6 @@ proc reportBody*(conf: ConfigRef, r: SemReport): string =
     of rsemObservableStores:
       result = "observable stores to '$1'" % r.ast.render
 
-    of rsemUncollectableRefCycle:
-      if r.cycleField == nil:
-        result = "'$#' creates an uncollectable ref cycle" % [r.ast.render]
-      else:
-        result = "'$#' creates an uncollectable ref cycle; annotate '$#' with .cursor" % [
-          r.ast.render, r.cycleField.render]
-
     of rsemResultUsed:
       result = "used 'result' variable"