[mono][interp] Defer compilation in bblocks with unitialized stack #108731

BrzVlad · 2024-10-09T21:30:33Z

Each basic block will have an emit state, not emitted, emitting or emitted. When we reach a new basic block, we will emit code into it only if the stack state is initialized (the stack state of a bblock can be initialized either from the state of the previous bblocks, if it is fallthrough, or from branching from another bblock with initialized state). If we encounter a bblock that doesn't have the state initialized we set a flag so we will retry codegen in an attempt to emit new bblocks.

Once we finish emitting code, we remove all bblocks in not emitted state.

Before this change, when encountering a bblock with unitialized stack, we assumed by chance that it had an empty stack, which is incorrect according to the spec. Also, in some cases we could simply crash, even if the block was indeed having an empty stack.

dotnet-policy-service · 2024-10-09T21:31:05Z

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Each basic block will have an emit state, not emitted, emitting or emitted. When we reach a new basic block, we will emit code into it only if the stack state is initialized (the stack state of a bblock can be initialized either from the state of the previous bblocks, if it is fallthrough, or from branching from another bblock with initialized state). If we encounter a bblock that doesn't have the state initialized we set a flag so we will retry codegen in an attempt to emit new bblocks. Once we finish emitting code, we remove all bblocks in not emitted state.

…ed ranges Following the change to only emit code in bblocks once we reach them with an initialized stack state, we have the side effect of not processing IL code in dead bblocks. This means that offset_to_bb might actually be null for some IL offsets, so we need to iterate over following il offsets until we find a mapped bblock.

kotlarmilos · 2024-11-04T10:34:40Z

src/mono/mono/mini/interp/transform-opt.c

-			g_assert (bb);
+			// If the bblock is detected as dead while traversing the IL code, the mapping for
+			// it is cleared. We can skip it.
+			if (!bb)


I’m concerned about relaxing the condition here. If bblock is null due to a bug, it might be incorrectly processed here. Could we explicitly annotate a bblock as dead instead?

We do have a dead field for basic blocks, but the existing pattern is that dead bblocks are no longer linked to live bblocks, they are not reachable. In addition to being a different pattern, having them still exist in the td->offset_to_bb mapping turned out to complicate code in other places. This condition here is used only for exception clause ranges so I would say the scope is limited enough so we don't risk serious bugs.

lewing · 2024-12-04T18:40:22Z

@BrzVlad looks like a big regression on wasm dotnet/perf-autofiling-issues#45939

Many methods in the BCL, especially hwintrins related, contain a lot of code that is detected as dead during compilation. On mono, inlining happens during IL import and a lot of optimizations are run as later passes. This exposed the issue where we have a lot of dead code bloat from inlining, with optimizations running on it. A simple solution for this problem was tracking jump counts for each bblock (dotnet#97514), which are initialized when bblocks are first created, before IL import stage. Then a small set of IL import level optimizations were added, in order to reduce the jump targets of each bblock. As we were further importing IL, if we reached a bblock with 0 jump targets, we would disable inlining into it, in order to reduce code bloat. Disabling code emit altogether was too challenging. Another limitation of this approach was that we would fail to detect dead code if it was part of a loop. The results were good however, by reducing mem usage in `System.Numerics.Tensor.Tests` from 6GB to 600MB. For an unrelated issue, the order in which we generate bblocks was redesigned in order to account for bblock stack state initialization in weird control flow scenarios (dotnet#108731). This was achieved by deferring IL import into bblock that was not yet reached from other live bblocks. A side effect of this is that we no longer generate code at all in unreachable bblocks, completely superseding the previous approach while addressing both the problems of inlining into loops or generating IR for dead IL. In the previously mentioned test suite, this further reduces the memory usage to 300MB. Remnants of the unnecessary `no_inlining` approach still lingered in the code, leading to disabling of inline optimization in some reachable code. This triggered a significant performance regression.

Many methods in the BCL, especially hwintrins related, contain a lot of code that is detected as dead during compilation. On mono, inlining happens during IL import and a lot of optimizations are run as later passes. This exposed the issue where we have a lot of dead code bloat from inlining, with optimizations later running on it. A simple solution for this problem was tracking jump counts for each bblock (dotnet#97514), which are initialized when bblocks are first created, before IL import stage. Then a small set of IL import level optimizations were added, in order to reduce the jump targets of each bblock. As we were further importing IL, if we reached a bblock with 0 jump targets, we would disable inlining into it, in order to reduce code bloat. Disabling code emit altogether was too challenging. Another limitation of this approach was that we would fail to detect dead code if it was part of a loop. The results were good however, by reducing mem usage in `System.Numerics.Tensor.Tests` from 6GB to 600MB. For an unrelated issue, the order in which we generate bblocks was redesigned in order to account for bblock stack state initialization in weird control flow scenarios (dotnet#108731). This was achieved by deferring IL import into bblocks that were not yet reached from other live bblocks. A side effect of this is that we no longer generate code at all in unreachable bblocks, completely superseding the previous approach while addressing both the problems of inlining into loops or generating IR for dead IL. In the previously mentioned test suite, this further reduced the memory usage to 300MB. Remnants of the unnecessary `no_inlining` approach still lingered in the code, leading to disabling of inline optimization in some reachable code. This triggered a significant performance regression which this PR addresses.

…otnet#108731) * [mono][interp] Add bblock start verbose logging * [mono][interp] Minor fixes around IL offsets with inlining * [mono][interp] Defer compilation in bblocks with unitialized stack Each basic block will have an emit state, not emitted, emitting or emitted. When we reach a new basic block, we will emit code into it only if the stack state is initialized (the stack state of a bblock can be initialized either from the state of the previous bblocks, if it is fallthrough, or from branching from another bblock with initialized state). If we encounter a bblock that doesn't have the state initialized we set a flag so we will retry codegen in an attempt to emit new bblocks. Once we finish emitting code, we remove all bblocks in not emitted state. * [mono][interp] Fix obtaining of native offsets when computing protected ranges Following the change to only emit code in bblocks once we reach them with an initialized stack state, we have the side effect of not processing IL code in dead bblocks. This means that offset_to_bb might actually be null for some IL offsets, so we need to iterate over following il offsets until we find a mapped bblock. --------- Co-authored-by: Larry Ewing <lewing@microsoft.com>

…0468) Many methods in the BCL, especially hwintrins related, contain a lot of code that is detected as dead during compilation. On mono, inlining happens during IL import and a lot of optimizations are run as later passes. This exposed the issue where we have a lot of dead code bloat from inlining, with optimizations later running on it. A simple solution for this problem was tracking jump counts for each bblock (#97514), which are initialized when bblocks are first created, before IL import stage. Then a small set of IL import level optimizations were added, in order to reduce the jump targets of each bblock. As we were further importing IL, if we reached a bblock with 0 jump targets, we would disable inlining into it, in order to reduce code bloat. Disabling code emit altogether was too challenging. Another limitation of this approach was that we would fail to detect dead code if it was part of a loop. The results were good however, by reducing mem usage in `System.Numerics.Tensor.Tests` from 6GB to 600MB. For an unrelated issue, the order in which we generate bblocks was redesigned in order to account for bblock stack state initialization in weird control flow scenarios (#108731). This was achieved by deferring IL import into bblocks that were not yet reached from other live bblocks. A side effect of this is that we no longer generate code at all in unreachable bblocks, completely superseding the previous approach while addressing both the problems of inlining into loops or generating IR for dead IL. In the previously mentioned test suite, this further reduced the memory usage to 300MB. Remnants of the unnecessary `no_inlining` approach still lingered in the code, leading to disabling of inline optimization in some reachable code. This triggered a significant performance regression which this PR addresses.

…net#110468) Many methods in the BCL, especially hwintrins related, contain a lot of code that is detected as dead during compilation. On mono, inlining happens during IL import and a lot of optimizations are run as later passes. This exposed the issue where we have a lot of dead code bloat from inlining, with optimizations later running on it. A simple solution for this problem was tracking jump counts for each bblock (dotnet#97514), which are initialized when bblocks are first created, before IL import stage. Then a small set of IL import level optimizations were added, in order to reduce the jump targets of each bblock. As we were further importing IL, if we reached a bblock with 0 jump targets, we would disable inlining into it, in order to reduce code bloat. Disabling code emit altogether was too challenging. Another limitation of this approach was that we would fail to detect dead code if it was part of a loop. The results were good however, by reducing mem usage in `System.Numerics.Tensor.Tests` from 6GB to 600MB. For an unrelated issue, the order in which we generate bblocks was redesigned in order to account for bblock stack state initialization in weird control flow scenarios (dotnet#108731). This was achieved by deferring IL import into bblocks that were not yet reached from other live bblocks. A side effect of this is that we no longer generate code at all in unreachable bblocks, completely superseding the previous approach while addressing both the problems of inlining into loops or generating IR for dead IL. In the previously mentioned test suite, this further reduced the memory usage to 300MB. Remnants of the unnecessary `no_inlining` approach still lingered in the code, leading to disabling of inline optimization in some reachable code. This triggered a significant performance regression which this PR addresses.

dotnet-issue-labeler bot added the area-Codegen-Interpreter-mono label Oct 9, 2024

dotnet-policy-service bot assigned BrzVlad Oct 9, 2024

BrzVlad force-pushed the fix-interp-bb-stack branch from cc95bce to 36b9374 Compare October 15, 2024 12:55

BrzVlad force-pushed the fix-interp-bb-stack branch from 36b9374 to 754a1a9 Compare October 21, 2024 07:14

This was referenced Oct 21, 2024

System.Net.Http.Functional.Tests.TelemetryTest failing with differing string #109024

Open

[debugger] Inspector exceptions in debugger tests #109025

Open

BrzVlad force-pushed the fix-interp-bb-stack branch from 754a1a9 to 9d8dbad Compare October 23, 2024 17:33

BrzVlad added 4 commits November 3, 2024 18:21

[mono][interp] Add bblock start verbose logging

300770e

[mono][interp] Minor fixes around IL offsets with inlining

e876fc2

BrzVlad force-pushed the fix-interp-bb-stack branch from 9d8dbad to f601a5b Compare November 3, 2024 17:08

BrzVlad changed the title ~~[mono][interp] testing~~ [mono][interp] Defer compilation in bblocks with unitialized stack Nov 3, 2024

build-analysis bot mentioned this pull request Nov 3, 2024

chrome-DebuggerTests.GetPropertiesTests timing out #109070

Open

BrzVlad marked this pull request as ready for review November 4, 2024 08:59

BrzVlad requested a review from kotlarmilos as a code owner November 4, 2024 08:59

kotlarmilos approved these changes Nov 4, 2024

View reviewed changes

Merge branch 'main' into fix-interp-bb-stack

213ab15

lewing requested review from steveisok and vitek-karas as code owners November 26, 2024 16:27

BrzVlad merged commit e99dcd7 into dotnet:main Nov 27, 2024
67 of 70 checks passed

lewing mentioned this pull request Dec 4, 2024

[Perf] Linux/x64: 653 Regressions on 11/27/2024 10:31:00 PM dotnet/perf-autofiling-issues#45939

Open

BrzVlad mentioned this pull request Dec 6, 2024

[mono][interp] Remove no_inlining functionality for dead bblocks #110468

Merged

github-actions bot locked and limited conversation to collaborators Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono][interp] Defer compilation in bblocks with unitialized stack #108731

[mono][interp] Defer compilation in bblocks with unitialized stack #108731

BrzVlad commented Oct 9, 2024 •

edited

Loading

dotnet-policy-service bot commented Oct 9, 2024

kotlarmilos Nov 4, 2024

BrzVlad Nov 5, 2024

lewing commented Dec 4, 2024

[mono][interp] Defer compilation in bblocks with unitialized stack #108731

[mono][interp] Defer compilation in bblocks with unitialized stack #108731

Conversation

BrzVlad commented Oct 9, 2024 • edited Loading

dotnet-policy-service bot commented Oct 9, 2024

kotlarmilos Nov 4, 2024

Choose a reason for hiding this comment

BrzVlad Nov 5, 2024

Choose a reason for hiding this comment

lewing commented Dec 4, 2024

BrzVlad commented Oct 9, 2024 •

edited

Loading