compiler: analyze type and value of global declarations separately #22303

mlugg · 2024-12-23T21:46:07Z

This PR performs 2 significant refactors, and then implements #131.

See commit messages for details.

The performance impact of this PR is variable, but here are two important tests:

Build Compiler with Self-Hosted

Benchmark 1 (8 runs): /home/mlugg/zig/master/stage4-fast/bin/zig build-exe --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=.zig-cache/c/c5119ba8c2bb22ea36039c62cc1b2325/options.zig -fno-llvm -fno-lld --cache-dir .zig-cache --global-cache-dir /home/mlugg/.cache/zig --name zig --zig-lib-dir lib
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.9s  ± 40.0ms    12.8s  … 13.0s           0 ( 0%)        0%
  peak_rss           1.02GB ± 30.8MB     996MB … 1.07GB          0 ( 0%)        0%
  cpu_cycles         78.6G  ±  125M     78.4G  … 78.8G           0 ( 0%)        0%
  instructions        167G  ± 2.40M      167G  …  167G           0 ( 0%)        0%
  cache_references   4.85G  ± 8.44M     4.84G  … 4.86G           1 (13%)        0%
  cache_misses        279M  ± 4.20M      274M  …  287M           0 ( 0%)        0%
  branch_misses       368M  ±  708K      366M  …  368M           1 (13%)        0%
Benchmark 2 (8 runs): /home/mlugg/zig/131/stage4-fast/bin/zig build-exe --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=.zig-cache/c/c5119ba8c2bb22ea36039c62cc1b2325/options.zig -fno-llvm -fno-lld --cache-dir .zig-cache --global-cache-dir /home/mlugg/.cache/zig --name zig --zig-lib-dir lib
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.7s  ± 31.5ms    12.6s  … 12.7s           0 ( 0%)        ⚡-  1.8% ±  0.3%
  peak_rss           1.01GB ± 24.0MB     989MB … 1.06GB          1 (13%)          -  1.0% ±  2.9%
  cpu_cycles         78.0G  ± 74.7M     77.8G  … 78.1G           0 ( 0%)          -  0.8% ±  0.1%
  instructions        169G  ± 2.39M      169G  …  169G           0 ( 0%)        💩+  1.3% ±  0.0%
  cache_references   4.82G  ± 8.18M     4.81G  … 4.84G           0 ( 0%)          -  0.5% ±  0.2%
  cache_misses        273M  ± 4.01M      266M  …  279M           0 ( 0%)          -  2.0% ±  1.6%
  branch_misses       358M  ±  760K      357M  …  359M           0 ( 0%)        ⚡-  2.7% ±  0.2%

Build `std` Tests with Self-Hosted

Benchmark 1 (3 runs): /home/mlugg/zig/master/stage4-fast/bin/zig test lib/std/std.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.5s  ± 34.2ms    10.4s  … 10.5s           0 ( 0%)        0%
  peak_rss            899MB ± 8.21MB     891MB …  907MB          0 ( 0%)        0%
  cpu_cycles         64.2G  ±  133M     64.1G  … 64.3G           0 ( 0%)        0%
  instructions        139G  ± 10.0M      139G  …  139G           0 ( 0%)        0%
  cache_references   4.22G  ± 19.6M     4.21G  … 4.24G           0 ( 0%)        0%
  cache_misses        147M  ± 1.63M      145M  …  148M           0 ( 0%)        0%
  branch_misses       182M  ±  320K      182M  …  182M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/mlugg/zig/131/stage4-fast/bin/zig test lib/std/std.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.7s  ± 32.3ms    10.7s  … 10.8s           0 ( 0%)        💩+  2.6% ±  0.7%
  peak_rss            896MB ± 5.18MB     891MB …  901MB          0 ( 0%)          -  0.3% ±  1.7%
  cpu_cycles         65.4G  ±  148M     65.3G  … 65.6G           0 ( 0%)        💩+  1.9% ±  0.5%
  instructions        144G  ± 1.05M      144G  …  144G           0 ( 0%)        💩+  3.5% ±  0.0%
  cache_references   4.16G  ± 12.8M     4.14G  … 4.17G           0 ( 0%)          -  1.4% ±  0.9%
  cache_misses        146M  ± 1.57M      145M  …  148M           0 ( 0%)          -  0.5% ±  2.5%
  branch_misses       179M  ±  245K      179M  …  179M           0 ( 0%)        ⚡-  2.0% ±  0.4%

The new representation is often more compact. It is also more straightforward to understand: for instance, `extern` is represented on the `declaration` instruction itself rather than using a special instruction. The same applies to `var`, making both of these far more compact. This commit also separates the type and value bodies of a `declaration` instruction. This is a prerequisite for ziglang#131. In general, `declaration` now directly encodes details of the syntax form used, and the embedded ZIR bodies are for actual expressions. The only exception to this is functions, where ZIR is effectively designed as if we had ziglang#1717. `extern fn` declarations are modeled as `extern const` with a function type, and normal `fn` definitions are modeled as `const` with a `func{,_fancy,_inferred}` instruction. This may change in the future, but improving on this was out of scope for this commit.

test/cases/compile_errors/address_of_threadlocal_not_comptime_known.zig

test/cases/compile_errors/type_variables_must_be_constant.zig

test/cases/compile_errors/use_invalid_number_literal_as_array_index.zig

The `Cau` abstraction originated from noting that one of the two primary roles of the legacy `Decl` type was to be the subject of comptime semantic analysis. However, the data stored in `Cau` has always had some level of redundancy. While preparing for ziglang#131, I went to remove that redundany, and realised that `Cau` now had exactly one field: `owner`. This led me to conclude that `Cau` is, in fact, an unnecessary level of abstraction over what are in reality *fundamentally different* kinds of analysis unit (`AnalUnit`). Types, `Nav` vals, and `comptime` declarations are all analyzed in different ways, and trying to treat them as the same thing is counterproductive! So, these 3 cases are now different alternatives in `AnalUnit`. To avoid stealing bits from `InternPool`-based IDs, which are already a little starved for bits due to the sharding datastructures, `AnalUnit` is expanded to 64 bits (30 of which are currently unused). This doesn't impact memory usage too much by default, because we don't store `AnalUnit`s all too often; however, we do store them a lot under `-fincremental`, so a non-trivial bump to peak RSS can be observed there. This will be improved in the future when I made `InternPool.DepEntry` less memory-inefficient. `Zcu.PerThread.ensureCauAnalyzed` is split into 3 functions, for each of the 3 new types of `AnalUnit`. The new logic is much easier to understand, because it avoids conflating the logic of these fundamentally different cases.

This commit separates semantic analysis of the annotated type vs value of a global declaration, therefore allowing recursive and mutually recursive values to be declared. Every `Nav` which undergoes analysis now has *two* corresponding `AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val` unit is responsible for *fully resolving* the `Nav`: determining its value, linksection, addrspace, etc. The `nav_ty` unit, on the other hand, resolves only the information necessary to construct a *pointer* to the `Nav`: its type, addrspace, etc. (It does also analyze its linksection, but that could be moved to `nav_val` I think; it doesn't make any difference). Analyzing a `nav_ty` for a declaration with no type annotation will just mark a dependency on the `nav_val`, analyze it, and finish. Conversely, analyzing a `nav_val` for a declaration *with* a type annotation will first mark a dependency on the `nav_ty` and analyze it, using this as the result type when evaluating the value body. The `nav_val` and `nav_ty` units always have references to one another: so, if a `Nav`'s type is referenced, its value implicitly is too, and vice versa. However, these dependencies are trivial, so, to save memory, are only known implicitly by logic in `resolveReferences`. In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the corresponding `Nav`. There are two exceptions to this. If the declaration is an `extern` declaration, then we immediately ensure the `Nav` value is resolved (which doesn't actually require any more analysis, since such a declaration has no value body anyway). Additionally, if the resolved type has type tag `.@"fn"`, we again immediately resolve the `Nav` value. The latter restriction is in place for two reasons: * Functions are special, in that their externs are allowed to trivially alias; i.e. with a declaration `extern fn foo(...)`, you can write `const bar = foo;`. This is not allowed for non-function externs, and it means that function types are the only place where it is possible for a declaration `Nav` to have a `.@"extern"` value without actually being declared `extern`. We need to identify this situation immediately so that the `decl_ref` can create a pointer to the *real* extern `Nav`, not this alias. * In certain situations, such as taking a pointer to a `Nav`, Sema needs to queue analysis of a runtime function if the value is a function. To do this, the function value needs to be known, so we need to resolve the value immediately upon `&foo` where `foo` is a function. This restriction is simple to codify into the eventual language specification, and doesn't limit the utility of this feature in practice. A consequence of this commit is that codegen and linking logic needs to be more careful when looking at `Nav`s. In general: * When `updateNav` or `updateFunc` is called, it is safe to assume that the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully resolved. * Any `Nav` whose value is/will be an `@"extern"` or a function is fully resolved; see `Nav.getExtern` for a helper for a common case here. * Any other `Nav` may only have its type resolved. This didn't seem to be too tricky to satisfy in any of the existing codegen/linker backends. Resolves: ziglang#131

andrewrk

Great work!

I looked through all of it. It's clearly a nice overall improvement. The incremental update logic is a bit opaque to me both before and after so I didn't check it too carefully.

Feel free to land it whenever. The conflicts with my wasm-linker branch look not too hard to resolve.

andrewrk · 2024-12-24T21:15:01Z

src/Zcu/PerThread.zig

+            // TODO: it's unclear how to gracefully handle this.
+            // To report the error cleanly, we need to add a message to `failed_analysis` and a
+            // corresponding entry to `retryable_failures`; but either of these things is quite
+            // likely to OOM at this point.
+            // If that happens, what do we do? Perhaps we could have a special field on `Zcu`
+            // for reporting OOM errors without allocating.
+            return error.OutOfMemory;


OOM should be handled in one of four ways:

Revert any state changes made in the current function and propagate error.OutOfMemory

Register a retryable error so that the next update() will try again

If the alloc failure is only a missing error message, call Compilation.setAllocFailure (while holding the compilation mutex if necessary) to indicate a missing error message due to OOM

@panic("out of memory handling not implemented for this code path (compiler bug)")

lib/std/zig/AstGen.zig

andrewrk · 2024-12-27T02:27:10Z

src/InternPool.zig

+    pub fn getExtern(nav: Nav, ip: *const InternPool) ?Key.Extern {
+        return switch (nav.status) {
+            .unresolved => unreachable,
+            .type_resolved => null,


this should be

.type_resolved => unreachable,

because otherwise a bug in the compiler will, instead of crashing here, treat an extern as if it were not an extern.

right?

Nope -- this is intentional, but a little subtle. The background is that in general, when lowering a pointer to a Nav, backends don't care about the Nav's value -- that's set up when you updateNav -- but they might care about the type, and about whether it's an extern. So, Sema is set up in such a way that it will always immediately fully resolve any Nav which might be an extern (that is, whose value might have key .@"extern"). Here's an excerpt from a commit message which explains it (although the original message had an important typo at the start which is fixed here):

In general, analyzing ZIR decl_ref will only analyze nav_ty of the
corresponding Nav. There are two exceptions to this. If the
declaration is an extern declaration, then we immediately ensure the
Nav value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag .@"fn", we again
immediately resolve the Nav value. The latter restriction is in place
for two reasons:

Functions are special, in that their externs are allowed to trivially
alias; i.e. with a declaration extern fn foo(...), you can write
const bar = foo;. This is not allowed for non-function externs, and
it means that function types are the only place where it is possible
for a declaration Nav to have a .@"extern" value without actually
being declared extern. We need to identify this situation
immediately so that the decl_ref can create a pointer to the real
extern Nav, not this alias.

In certain situations, such as taking a pointer to a Nav, Sema needs
to queue analysis of a runtime function if the value is a function. To
do this, the function value needs to be known, so we need to resolve
the value immediately upon &foo where foo is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

This getExtern function should generally only be called from codegen/link, since during semantic analysis, there are certain times where the situation you point out could indeed come up. The idea is that when lowering e.g. a pointer to some_nav, the linker code can do if (some_nav.getExtern(ip)) |e| ... in the event that it needs special handling to emit a pointer to an extern.

I'm confused because you said "nope" but then everything you said after that seems to agree with me-

If you call it from the backend, it's supposed to be fully resolved. If you call it from the backend and it's not fully resolved, then it's a bug. So it should be .type_resolved => unreachable.

What's an example of when it would be correct code to branch on the optional, and the control flow takes the .type_resolved => null branch?

If you call it from the backend, it's supposed to be fully resolved.

No, this is not accurate. If you call this from the backend, we have the guarantee that if it would have value .@"extern", then it is resolved. It may not be resolved if we know for certain that it isn't an extern -- which, in almost all cases, we do know.

What's an example of when it would be correct code to branch on the optional, and the control flow takes the .type_resolved => null branch?

This happens all the time: there's an opportunity for it any time codegen lowers a function which includes a decl_ref of something that's not extern. For instance, this code probably does it:

test { // When this test is codegenned, `foo` has its type resolved, but not its // value resolved (this specific fact is an implementation detail). // Because `foo` is not declared `extern`, and `u32` is not a function type, // we know it can't be an extern, so Sema knows that it is safe to queue // resolution of the *value* for later -- because it knows that `getExtern` // *would* return `null` if the value were resolved. // The linker wants to call `nav_of_foo.getExtern(ip)` to construct the value `&foo`. f(&foo); } fn f(ptr: *const anyopaque) void { ... } // Once we `updateNav` on `foo` itself, *then* the backend is allowed to assume `foo` is fully resolved. const foo: u32 = 123;

OK I think I get it. Then we need a different function for getting an extern, not from the nav ref path, but from the update nav path, and that one should assert fully resolved.

I don't think that's necessary, since in that case you also want to catch things like .variable, so you probably ought to just be writing:

switch (ip.indexToKey(nav.status.fully_resolved.val)) { .@"extern" => |e| { // extern }. .variable => |v| { // global mutable variable }, .func => { // comptime function alias }, else => { // global constant }, }

Let's see... I have 4 callsites and 100% of them do .? on the return value, because they are already known to be externs.

Hm, okay. If you'd find it useful, feel free to make a separate function in your wasm linker PR or similar.

Understood and thanks for the discussion- this prevented me from causing a regression during a merge conflict resolution.

nektro reviewed Dec 23, 2024

View reviewed changes

test/cases/compile_errors/address_of_threadlocal_not_comptime_known.zig Show resolved Hide resolved

test/cases/compile_errors/type_variables_must_be_constant.zig Show resolved Hide resolved

test/cases/compile_errors/use_invalid_number_literal_as_array_index.zig Show resolved Hide resolved

mlugg force-pushed the 131-new branch from b7a6ba3 to 859c991 Compare December 24, 2024 01:58

mlugg added 2 commits December 24, 2024 02:18

mlugg force-pushed the 131-new branch from 859c991 to 3afda43 Compare December 24, 2024 02:18

andrewrk approved these changes Dec 24, 2024

View reviewed changes

mlugg merged commit 497592c into ziglang:master Dec 25, 2024
10 checks passed

mlugg added the release notes This PR should be mentioned in the release notes. label Dec 25, 2024

Fri3dNstuff mentioned this pull request Dec 25, 2024

erroneous dependency loop for globals self/mutually referencing the addresses of fields #22314

Closed

Techatrix reviewed Dec 26, 2024

View reviewed changes

lib/std/zig/AstGen.zig Show resolved Hide resolved

andrewrk reviewed Dec 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: analyze type and value of global declarations separately #22303

compiler: analyze type and value of global declarations separately #22303

mlugg commented Dec 23, 2024

andrewrk left a comment

andrewrk Dec 24, 2024

andrewrk Dec 27, 2024

mlugg Dec 27, 2024 •

edited

Loading

andrewrk Dec 27, 2024

mlugg Dec 27, 2024

andrewrk Dec 27, 2024

mlugg Dec 27, 2024

andrewrk Dec 27, 2024

mlugg Dec 27, 2024

andrewrk Dec 27, 2024

compiler: analyze type and value of global declarations separately #22303

compiler: analyze type and value of global declarations separately #22303

Conversation

mlugg commented Dec 23, 2024

Build Compiler with Self-Hosted

Build std Tests with Self-Hosted

andrewrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlugg Dec 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Build `std` Tests with Self-Hosted

mlugg Dec 27, 2024 •

edited

Loading