Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: analyze type and value of global declarations separately #22303

Merged
merged 3 commits into from
Dec 25, 2024

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Dec 23, 2024

This PR performs 2 significant refactors, and then implements #131.

See commit messages for details.

The performance impact of this PR is variable, but here are two important tests:

Build Compiler with Self-Hosted

Benchmark 1 (8 runs): /home/mlugg/zig/master/stage4-fast/bin/zig build-exe --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=.zig-cache/c/c5119ba8c2bb22ea36039c62cc1b2325/options.zig -fno-llvm -fno-lld --cache-dir .zig-cache --global-cache-dir /home/mlugg/.cache/zig --name zig --zig-lib-dir lib
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.9s  ± 40.0ms    12.8s  … 13.0s           0 ( 0%)        0%
  peak_rss           1.02GB ± 30.8MB     996MB … 1.07GB          0 ( 0%)        0%
  cpu_cycles         78.6G  ±  125M     78.4G  … 78.8G           0 ( 0%)        0%
  instructions        167G  ± 2.40M      167G  …  167G           0 ( 0%)        0%
  cache_references   4.85G  ± 8.44M     4.84G  … 4.86G           1 (13%)        0%
  cache_misses        279M  ± 4.20M      274M  …  287M           0 ( 0%)        0%
  branch_misses       368M  ±  708K      366M  …  368M           1 (13%)        0%
Benchmark 2 (8 runs): /home/mlugg/zig/131/stage4-fast/bin/zig build-exe --stack 33554432 -fno-sanitize-thread -ODebug --dep aro --dep aro_translate_c --dep build_options -Mroot=src/main.zig -Maro=lib/compiler/aro/aro.zig --dep aro -Maro_translate_c=lib/compiler/aro_translate_c.zig -Mbuild_options=.zig-cache/c/c5119ba8c2bb22ea36039c62cc1b2325/options.zig -fno-llvm -fno-lld --cache-dir .zig-cache --global-cache-dir /home/mlugg/.cache/zig --name zig --zig-lib-dir lib
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          12.7s  ± 31.5ms    12.6s  … 12.7s           0 ( 0%)        ⚡-  1.8% ±  0.3%
  peak_rss           1.01GB ± 24.0MB     989MB … 1.06GB          1 (13%)          -  1.0% ±  2.9%
  cpu_cycles         78.0G  ± 74.7M     77.8G  … 78.1G           0 ( 0%)          -  0.8% ±  0.1%
  instructions        169G  ± 2.39M      169G  …  169G           0 ( 0%)        💩+  1.3% ±  0.0%
  cache_references   4.82G  ± 8.18M     4.81G  … 4.84G           0 ( 0%)          -  0.5% ±  0.2%
  cache_misses        273M  ± 4.01M      266M  …  279M           0 ( 0%)          -  2.0% ±  1.6%
  branch_misses       358M  ±  760K      357M  …  359M           0 ( 0%)        ⚡-  2.7% ±  0.2%

Build std Tests with Self-Hosted

Benchmark 1 (3 runs): /home/mlugg/zig/master/stage4-fast/bin/zig test lib/std/std.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.5s  ± 34.2ms    10.4s  … 10.5s           0 ( 0%)        0%
  peak_rss            899MB ± 8.21MB     891MB …  907MB          0 ( 0%)        0%
  cpu_cycles         64.2G  ±  133M     64.1G  … 64.3G           0 ( 0%)        0%
  instructions        139G  ± 10.0M      139G  …  139G           0 ( 0%)        0%
  cache_references   4.22G  ± 19.6M     4.21G  … 4.24G           0 ( 0%)        0%
  cache_misses        147M  ± 1.63M      145M  …  148M           0 ( 0%)        0%
  branch_misses       182M  ±  320K      182M  …  182M           0 ( 0%)        0%
Benchmark 2 (3 runs): /home/mlugg/zig/131/stage4-fast/bin/zig test lib/std/std.zig --zig-lib-dir lib -fno-llvm -fno-lld --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          10.7s  ± 32.3ms    10.7s  … 10.8s           0 ( 0%)        💩+  2.6% ±  0.7%
  peak_rss            896MB ± 5.18MB     891MB …  901MB          0 ( 0%)          -  0.3% ±  1.7%
  cpu_cycles         65.4G  ±  148M     65.3G  … 65.6G           0 ( 0%)        💩+  1.9% ±  0.5%
  instructions        144G  ± 1.05M      144G  …  144G           0 ( 0%)        💩+  3.5% ±  0.0%
  cache_references   4.16G  ± 12.8M     4.14G  … 4.17G           0 ( 0%)          -  1.4% ±  0.9%
  cache_misses        146M  ± 1.57M      145M  …  148M           0 ( 0%)          -  0.5% ±  2.5%
  branch_misses       179M  ±  245K      179M  …  179M           0 ( 0%)        ⚡-  2.0% ±  0.4%

The new representation is often more compact. It is also more
straightforward to understand: for instance, `extern` is represented on
the `declaration` instruction itself rather than using a special
instruction. The same applies to `var`, making both of these far more
compact.

This commit also separates the type and value bodies of a `declaration`
instruction. This is a prerequisite for ziglang#131.

In general, `declaration` now directly encodes details of the syntax
form used, and the embedded ZIR bodies are for actual expressions. The
only exception to this is functions, where ZIR is effectively designed
as if we had ziglang#1717. `extern fn` declarations are modeled as
`extern const` with a function type, and normal `fn` definitions are
modeled as `const` with a `func{,_fancy,_inferred}` instruction. This
may change in the future, but improving on this was out of scope for
this commit.
The `Cau` abstraction originated from noting that one of the two primary
roles of the legacy `Decl` type was to be the subject of comptime
semantic analysis. However, the data stored in `Cau` has always had some
level of redundancy. While preparing for ziglang#131, I went to remove that
redundany, and realised that `Cau` now had exactly one field: `owner`.

This led me to conclude that `Cau` is, in fact, an unnecessary level of
abstraction over what are in reality *fundamentally different* kinds of
analysis unit (`AnalUnit`). Types, `Nav` vals, and `comptime`
declarations are all analyzed in different ways, and trying to treat
them as the same thing is counterproductive!

So, these 3 cases are now different alternatives in `AnalUnit`. To avoid
stealing bits from `InternPool`-based IDs, which are already a little
starved for bits due to the sharding datastructures, `AnalUnit` is
expanded to 64 bits (30 of which are currently unused). This doesn't
impact memory usage too much by default, because we don't store
`AnalUnit`s all too often; however, we do store them a lot under
`-fincremental`, so a non-trivial bump to peak RSS can be observed
there. This will be improved in the future when I made
`InternPool.DepEntry` less memory-inefficient.

`Zcu.PerThread.ensureCauAnalyzed` is split into 3 functions, for each of
the 3 new types of `AnalUnit`. The new logic is much easier to
understand, because it avoids conflating the logic of these
fundamentally different cases.
This commit separates semantic analysis of the annotated type vs value
of a global declaration, therefore allowing recursive and mutually
recursive values to be declared.

Every `Nav` which undergoes analysis now has *two* corresponding
`AnalUnit`s: `.{ .nav_val = n }` and `.{ .nav_ty = n }`. The `nav_val`
unit is responsible for *fully resolving* the `Nav`: determining its
value, linksection, addrspace, etc. The `nav_ty` unit, on the other
hand, resolves only the information necessary to construct a *pointer*
to the `Nav`: its type, addrspace, etc. (It does also analyze its
linksection, but that could be moved to `nav_val` I think; it doesn't
make any difference).

Analyzing a `nav_ty` for a declaration with no type annotation will just
mark a dependency on the `nav_val`, analyze it, and finish. Conversely,
analyzing a `nav_val` for a declaration *with* a type annotation will
first mark a dependency on the `nav_ty` and analyze it, using this as
the result type when evaluating the value body.

The `nav_val` and `nav_ty` units always have references to one another:
so, if a `Nav`'s type is referenced, its value implicitly is too, and
vice versa. However, these dependencies are trivial, so, to save memory,
are only known implicitly by logic in `resolveReferences`.

In general, analyzing ZIR `decl_val` will only analyze `nav_ty` of the
corresponding `Nav`. There are two exceptions to this. If the
declaration is an `extern` declaration, then we immediately ensure the
`Nav` value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag `.@"fn"`, we again
immediately resolve the `Nav` value. The latter restriction is in place
for two reasons:

* Functions are special, in that their externs are allowed to trivially
  alias; i.e. with a declaration `extern fn foo(...)`, you can write
  `const bar = foo;`. This is not allowed for non-function externs, and
  it means that function types are the only place where it is possible
  for a declaration `Nav` to have a `.@"extern"` value without actually
  being declared `extern`. We need to identify this situation
  immediately so that the `decl_ref` can create a pointer to the *real*
  extern `Nav`, not this alias.
* In certain situations, such as taking a pointer to a `Nav`, Sema needs
  to queue analysis of a runtime function if the value is a function. To
  do this, the function value needs to be known, so we need to resolve
  the value immediately upon `&foo` where `foo` is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

A consequence of this commit is that codegen and linking logic needs to
be more careful when looking at `Nav`s. In general:

* When `updateNav` or `updateFunc` is called, it is safe to assume that
  the `Nav` being updated (the owner `Nav` for `updateFunc`) is fully
  resolved.
* Any `Nav` whose value is/will be an `@"extern"` or a function is fully
  resolved; see `Nav.getExtern` for a helper for a common case here.
* Any other `Nav` may only have its type resolved.

This didn't seem to be too tricky to satisfy in any of the existing
codegen/linker backends.

Resolves: ziglang#131
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I looked through all of it. It's clearly a nice overall improvement. The incremental update logic is a bit opaque to me both before and after so I didn't check it too carefully.

Feel free to land it whenever. The conflicts with my wasm-linker branch look not too hard to resolve.

Comment on lines +614 to +620
// TODO: it's unclear how to gracefully handle this.
// To report the error cleanly, we need to add a message to `failed_analysis` and a
// corresponding entry to `retryable_failures`; but either of these things is quite
// likely to OOM at this point.
// If that happens, what do we do? Perhaps we could have a special field on `Zcu`
// for reporting OOM errors without allocating.
return error.OutOfMemory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOM should be handled in one of four ways:

  1. Revert any state changes made in the current function and propagate error.OutOfMemory
  2. Register a retryable error so that the next update() will try again
  3. If the alloc failure is only a missing error message, call Compilation.setAllocFailure (while holding the compilation mutex if necessary) to indicate a missing error message due to OOM
  4. @panic("out of memory handling not implemented for this code path (compiler bug)")

@mlugg mlugg merged commit 497592c into ziglang:master Dec 25, 2024
10 checks passed
@mlugg mlugg added the release notes This PR should be mentioned in the release notes. label Dec 25, 2024
pub fn getExtern(nav: Nav, ip: *const InternPool) ?Key.Extern {
return switch (nav.status) {
.unresolved => unreachable,
.type_resolved => null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be

.type_resolved => unreachable,

because otherwise a bug in the compiler will, instead of crashing here, treat an extern as if it were not an extern.

right?

Copy link
Member Author

@mlugg mlugg Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope -- this is intentional, but a little subtle. The background is that in general, when lowering a pointer to a Nav, backends don't care about the Nav's value -- that's set up when you updateNav -- but they might care about the type, and about whether it's an extern. So, Sema is set up in such a way that it will always immediately fully resolve any Nav which might be an extern (that is, whose value might have key .@"extern"). Here's an excerpt from a commit message which explains it (although the original message had an important typo at the start which is fixed here):

In general, analyzing ZIR decl_ref will only analyze nav_ty of the
corresponding Nav. There are two exceptions to this. If the
declaration is an extern declaration, then we immediately ensure the
Nav value is resolved (which doesn't actually require any more
analysis, since such a declaration has no value body anyway).
Additionally, if the resolved type has type tag .@"fn", we again
immediately resolve the Nav value. The latter restriction is in place
for two reasons:

  • Functions are special, in that their externs are allowed to trivially
    alias; i.e. with a declaration extern fn foo(...), you can write
    const bar = foo;. This is not allowed for non-function externs, and
    it means that function types are the only place where it is possible
    for a declaration Nav to have a .@"extern" value without actually
    being declared extern. We need to identify this situation
    immediately so that the decl_ref can create a pointer to the real
    extern Nav, not this alias.
  • In certain situations, such as taking a pointer to a Nav, Sema needs
    to queue analysis of a runtime function if the value is a function. To
    do this, the function value needs to be known, so we need to resolve
    the value immediately upon &foo where foo is a function.

This restriction is simple to codify into the eventual language
specification, and doesn't limit the utility of this feature in
practice.

This getExtern function should generally only be called from codegen/link, since during semantic analysis, there are certain times where the situation you point out could indeed come up. The idea is that when lowering e.g. a pointer to some_nav, the linker code can do if (some_nav.getExtern(ip)) |e| ... in the event that it needs special handling to emit a pointer to an extern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused because you said "nope" but then everything you said after that seems to agree with me-

If you call it from the backend, it's supposed to be fully resolved. If you call it from the backend and it's not fully resolved, then it's a bug. So it should be .type_resolved => unreachable.

What's an example of when it would be correct code to branch on the optional, and the control flow takes the .type_resolved => null branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call it from the backend, it's supposed to be fully resolved.

No, this is not accurate. If you call this from the backend, we have the guarantee that if it would have value .@"extern", then it is resolved. It may not be resolved if we know for certain that it isn't an extern -- which, in almost all cases, we do know.

What's an example of when it would be correct code to branch on the optional, and the control flow takes the .type_resolved => null branch?

This happens all the time: there's an opportunity for it any time codegen lowers a function which includes a decl_ref of something that's not extern. For instance, this code probably does it:

test {
    // When this test is codegenned, `foo` has its type resolved, but not its
    // value resolved (this specific fact is an implementation detail).
    // Because `foo` is not declared `extern`, and `u32` is not a function type,
    // we know it can't be an extern, so Sema knows that it is safe to queue
    // resolution of the *value* for later -- because it knows that `getExtern`
    // *would* return `null` if the value were resolved.
    // The linker wants to call `nav_of_foo.getExtern(ip)` to construct the value `&foo`.
    f(&foo);
}
fn f(ptr: *const anyopaque) void { ... }
// Once we `updateNav` on `foo` itself, *then* the backend is allowed to assume `foo` is fully resolved.
const foo: u32 = 123;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think I get it. Then we need a different function for getting an extern, not from the nav ref path, but from the update nav path, and that one should assert fully resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary, since in that case you also want to catch things like .variable, so you probably ought to just be writing:

switch (ip.indexToKey(nav.status.fully_resolved.val)) {
    .@"extern" => |e| {
        // extern
    }.
    .variable => |v| {
        // global mutable variable
    },
    .func => {
        // comptime function alias
    },
    else => {
        // global constant
    },
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see... I have 4 callsites and 100% of them do .? on the return value, because they are already known to be externs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, okay. If you'd find it useful, feel free to make a separate function in your wasm linker PR or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood and thanks for the discussion- this prevented me from causing a regression during a merge conflict resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants