compiler: allow semantic analysis of files with AstGen errors #22157

mlugg · 2024-12-05T20:23:14Z

Zig's compilation model is inherently multi-pass: AstGen emits errors which do not depend on semantic analysis (such errors have the property of being emitted even for unreferenced code), while Sema performs semantic analysis and triggers the corresponding errors. Today, the Zig compiler has the property that the existence of any AstGen errors in a file cause that file to be considered "failed", such that semantic analysis can never occur on any code within that file. This has two unfortunate consequences:

The existence of AstGen errors, even "trivial" ones such as "unused local variable", can cause analysis errors which are related to this file to not be shown, even if they clearly could be. This makes for a worse UX.
If an incremental update introduces AstGen errors, all prior semantic analysis work for that file needs to be thrown away, again, even if the AstGen error is "trivial". This can result in a huge amount of redundant work in practice.

Therefore, to solve both of this issues, we introduce a mechanism by which a file can undergo semantic analysis even if it contains AstGen errors. As well as the existing Zir.hasCompileErrors function, there is a new function, Zir.loweringFailed. As long as this returns false, AstGen has succeeded in lowering to ZIR, even if it encountered errors along the way.

Some AstGen errors, such as "unused local variable", are "non-fatal"; these do not affect ZIR lowering at all, although the errors are still reported. Others, like "undeclared identifier", are "fatal", since there is no sane way to continue ZIR lowering past them. If a container-level declaration (const/var/fn/test/comptime) encounters a fatal error, the declaration's entire body in ZIR is replaced with an extended(astgen_error()) instruction, which, when analyzed, simply returns error.AnalysisFail, marking a transitive semantic analysis error.

There are some improvements in this PR to ensure that incremental compilation is handled correctly with these changes; some of these improvements actually fix existing bugs, so this PR likely makes incremental compilation more effective.

This commit enhances AstGen to introduce a form of error resilience which allows valid ZIR to be emitted even when AstGen errors occur. When a non-fatal AstGen error (e.g. `appendErrorNode`) occurs, ZIR generation is not affected; the error is added to `astgen.errors` and ultimately to the errors stored in `extra`, but that doesn't stop us getting valid ZIR. Fatal AstGen errors (e.g. `failNode`) are a bit trickier. These errors return `error.AnalysisFail`, which is propagated up the stack. In theory, any parent expression can catch this error and handle it, continuing ZIR generation whilst throwing away whatever was lost. For now, we only do this in one place: when creating declarations. If a call to `fnDecl`, `comptimeDecl`, `globalVarDecl`, etc, returns `error.AnalysisFail`, the `declaration` instruction is still created, but its body simply contains the new `extended(astgen_error())` instruction, which instructs Sema to terminate semantic analysis with a transitive error. This means that a fatal AstGen error causes the innermost declaration containing the error to fail, but the rest of the file remains intact. If a source file contains parse errors, or an `error.AnalysisFail` happens when lowering the top-level struct (e.g. there is an error in one of its fields, or a name has multiple declarations), then lowering for the entire file fails. Alongside the existing `Zir.hasCompileErrors` query, this commit introduces `Zir.loweringFailed`, which returns `true` only in this case. The end result here is that files with AstGen failures will almost always still emit valid ZIR, and hence can undergo semantic analysis on the parts of the file which are (from AstGen's perspective) valid. This is a noteworthy improvement to UX, but the main motivation here is actually incremental compilation. Previously, AstGen failures caused lots of semantic analysis work to be thrown out, because all `AnalUnit`s in the file required re-analysis so as to trigger necessary transitive failures and remove stored compile errors which would no longer make sense (because a fresh compilation of this code would not emit those errors, as the units those errors applied to would fail sooner due to referencing a failed file). Now, this case only applies when a file has severe top-level errors, which is far less common than something like having an unused variable. Lastly, this commit changes a few errors in `AstGen` to become fatal when they were previously non-fatal and vice versa. If there is still a reasonable way to continue AstGen and lower to ZIR after an error, it is non-fatal; otherwise, it is fatal. For instance, `comptime const`, while redundant syntax, has a clear meaning we can lower; on the other hand, using an undeclared identifer has no sane lowering, so must trigger a fatal error.

The previous commit exposed some bugs in incremental compilation. This commit fixes those, and adds a little more logging for debugging incremental compilation. Also, allow `ast-check -t` to dump ZIR when there are non-fatal AstGen errors.

The main change here is to partition tracked instructions found within a declaration. It's very unlikely that, for instance, a `struct { ... }` type declaration was intentionally turned into a reification or an anonymous initialization, so it makes sense to track things in a few different arrays. In particular, this fixes an issue where a `func` instruction could wrongly be mapped to something else if the types of function parameters changed. This would cause huge problems further down the pipeline; we expect that if a `declaration` is tracked, and it previously contained a `func`/`func_inferred`/`func_fancy`, then this instruction is either tracked to another `func`/`func_inferred`/`func_fancy` instruction, or is lost. Also, this commit takes the opportunity to rename the functions actually doing this logic. `Zir.findDecls` was a name that might have made sense at some point, but nowadays, it's definitely not finding declarations, and it's not *exclusively* finding type declarations. Instead, the point is to find instructions which we want to track; hence the new name, `Zir.findTrackable`. Lastly, a nice side effect of partitioning the output of `findTrackable` is that `Zir.declIterator` no longer needs to accept input instructions which aren't type declarations (e.g. `reify`, `func`).

The introduction of the `extended(astgen_error())` instruction allows a `test` declaration to be unresolved, i.e. the declaration doesn't even contain a `func`. I could modify AstGen to not do this, but it makes more sense to just handle this case when collecting test functions. Note that tests under incremental compilation are currently broken if you ever remove all references to a test; this is tracked as a subtask of ziglang#21165.

mlugg added 4 commits December 5, 2024 19:58

compiler: incremental compilation fixes

7f3211a

The previous commit exposed some bugs in incremental compilation. This commit fixes those, and adds a little more logging for debugging incremental compilation. Also, allow `ast-check -t` to dump ZIR when there are non-fatal AstGen errors.

andrewrk approved these changes Dec 8, 2024

View reviewed changes

andrewrk merged commit 7575f21 into ziglang:master Dec 9, 2024
10 checks passed

Techatrix mentioned this pull request Dec 11, 2024

AstGen: improve analysis when encountering discard of error capture #22209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: allow semantic analysis of files with AstGen errors #22157

compiler: allow semantic analysis of files with AstGen errors #22157

mlugg commented Dec 5, 2024

compiler: allow semantic analysis of files with AstGen errors #22157

compiler: allow semantic analysis of files with AstGen errors #22157

Conversation

mlugg commented Dec 5, 2024