JIT: Improve local assertion prop throughput #94597

AndyAyersMS · 2023-11-10T07:27:59Z

Leverage the "dep vectors" to avoid the search the assertion table during local assertion prop. Helps the current (small table) behavior some, helps the future cross-block (larger table) behavior more.

Similar tricks may be possible for global AP, though the set of assertions there is more varied.

Contributes to #93246.

ghost · 2023-11-10T07:28:11Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2023-11-10T16:05:30Z

Locally this looks really good for TP once cross-block is enabled. I picked up the fix from #94608 so let's see what the lab thinks.

AndyAyersMS · 2023-11-10T19:13:19Z

TP Impact on cross-block assertion prop

Baseline cross-block TP

With this PR at ~~034dbda~~ 548be19

After 6a07b54

After 631a42f

This reverts commit 062552152a1d209f6d9dbd89da8dd7f14b1a56fd.

AndyAyersMS · 2023-11-10T19:29:26Z

With 6a07b54 I am now seeing 0.13% (with cross-block local AP enabled) on benchmarks.run; baseline is more like 1.57%.... so this may get us to a happy TP place.

Would be nice to enacapsulate this pattern a bit better, perhaps. Also maybe in global AP we can think about ways to do similar indexing, as there are still various whole-table searches going on there.

AndyAyersMS · 2023-11-11T01:48:51Z

One odd thing is that after the force push / rebase I lost some of the improvements and gained some sizeable regressions. EG win x64 went from -1.1M to -800K.

I've verified this change is no diff vs baseline (either with or without cross-block enabled) so not sure what happened just yet. Will need to diff against some older build I guess.

AndyAyersMS · 2023-11-11T01:59:33Z

@jakobbotsch PTAL
cc @dotnet/jit-contrib

AndyAyersMS · 2023-11-11T05:44:20Z

Diffs

No codegen diffs; some TP a bit faster, some slower -- the more dramatic TP impact here is with cross-block enabled (see above). It may be that our bit vector iterator has some unnecessary overhead, as it seems like walking through an N entry table (with at most 64 entries) should generally be more costly that iterating through a bit vector with at most N (and likely just a few) bits set.

jakobbotsch · 2023-11-11T12:31:12Z

src/coreclr/jit/assertionprop.cpp

+            AssertionIndex const index        = GetAssertionIndex(bvIndex);
+            AssertionDsc* const  curAssertion = optGetAssertion(index);
+
+            if (curAssertion->Equals(newAssertion, !optLocalAssertionProp))


Suggested change

if (curAssertion->Equals(newAssertion, !optLocalAssertionProp))

if (curAssertion->Equals(newAssertion, /* vnBased */ false))

(Also in the other cases)

jakobbotsch · 2023-11-11T12:35:53Z

src/coreclr/jit/assertionprop.cpp

+        if (newAssertion->op2.kind == O2K_LCLVAR_COPY)
+        {
+            lclNum = newAssertion->op2.lcl.lclNum;
+            BitVecOps::Iter iter(apTraits, GetAssertionDep(lclNum));
+            unsigned        bvIndex = 0;
+            while (iter.NextElem(&bvIndex))
+            {
+                AssertionIndex const index        = GetAssertionIndex(bvIndex);
+                AssertionDsc* const  curAssertion = optGetAssertion(index);
+
+                if (curAssertion->Equals(newAssertion, !optLocalAssertionProp))
+                {
+                    return index;
+                }
+            }
+        }


Is this necessary? Shouldn't the previous case have found it if there is an equal assertion? Or will we only keep an assertion like v1 = v2 in one of the bit vectors?

No, it's kept for both (see just below) -- so yeah that second loop is not needed.

AndyAyersMS · 2023-11-11T22:05:28Z

One odd thing is that after the force push / rebase I lost some of the improvements and gained some sizeable regressions. EG win x64 went from -1.1M to -800K.

I've verified this change is no diff vs baseline (either with or without cross-block enabled) so not sure what happened just yet. Will need to diff against some older build I guess.

Aha, some OSR cases got pessimized with the DFS reachability check, as the original method entry and such are considered unreachable. Will push a fix.

AndyAyersMS · 2023-11-11T22:27:13Z

There are around 100 or so methods that see diffs now with cross-block enabled (because of the now fixed reachability check); seems like a lot of them are compiled regex methods:

[14:24:05] Top method improvements (percentages):
[14:24:05]          -65 (-4.90% of base) : 304049.dasm - System.Text.RegularExpressions.CompiledRegexRunner:Regex32447_TryMatchAtCurrentPosition(System.Text.RegularExpressions.RegexRunner,System.ReadOnlySpan`1[ushort]):ubyte (FullOpts)
[14:24:05]          -73 (-3.87% of base) : 298116.dasm - System.Text.RegularExpressions.CompiledRegexRunner:Regex5110_TryMatchAtCurrentPosition(System.Text.RegularExpressions.RegexRunner,System.ReadOnlySpan`1[ushort]):ubyte (FullOpts)
[14:24:05]          -34 (-3.84% of base) : 298413.dasm - System.Text.RegularExpressions.CompiledRegexRunner:Regex5473_TryMatchAtCurrentPosition(System.Text.RegularExpressions.RegexRunner,System.ReadOnlySpan`1[ushort]):ubyte (FullOpts)
[14:24:05]          -10 (-1.35% of base) : 304046.dasm - System.Text.RegularExpressions.CompiledRegexRunner:Regex32443_TryMatchAtCurrentPosition(System.Text.RegularExpressions.RegexRunner,System.ReadOnlySpan`1[ushort]):ubyte (FullOpts)
[14:24:05]           -9 (-1.18% of base) : 130597.dasm - System.IO.Compression.ZipArchive:WriteFile():this (FullOpts)
...etc...

AndyAyersMS · 2023-11-11T23:24:26Z

Local TP diff with cross-block enabled (based on 7c8fff9). Seems almost too good to be true.

I'll prep another PR for enabling once this bit is merged, and we'll see what the lab says.

AndyAyersMS · 2023-11-12T15:43:10Z

@jakobbotsch take another look when you can...

jakobbotsch · 2023-11-13T09:57:25Z

src/coreclr/jit/fgopt.cpp

+    {
+        if (!BlockSetOps::IsMember(this, visited, fgEntryBB->bbNum))
+        {
+            fgDfsReversePostorderHelper(fgEntryBB, visited, preorderIndex, postorderIndex);


These aren't really reachable, right? (Except for maybe tailcall-to-loop opt). But we want them in the order to be able to propagate facts from them.

I wonder if SSA/VN could benefit similarly to seeing these.

It seems like a sign that CQ for OSR methods would benefit from inserting the jump to the start block very late, e.g. right before lowering (probably with some additional DCE for the unreachable blocks). Of course at some cost of TP, so maybe it would need to be restricted to loop-based patchpoints while the current behavior is ideal for partial compilation.

Added a note to#33658 to try this someday. One consideration for deferral is that we might need to alter or block some optimizations that might remove computations from the OSR side (hoisting, CSE).

Also note currently fgEntryBB gets nulled out after morph.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 10, 2023

ghost assigned AndyAyersMS Nov 10, 2023

build-analysis bot mentioned this pull request Nov 10, 2023

CI error: System.Net.Quic.QuicException: The connection timed out from inactivity #91757

Closed

AndyAyersMS added 6 commits November 10, 2023 11:21

avoid searching entire assertion table where possible

b80e73d

enable

d4fb339

Revert "enable"

fdfc1ee

This reverts commit 062552152a1d209f6d9dbd89da8dd7f14b1a56fd.

re-enable

548be19

handle local prop add via dep vectors

6a07b54

don't use pred assertions from unreachables

e98dca2

AndyAyersMS force-pushed the SearchLessAssertionProp branch from 034dbda to e98dca2 Compare November 10, 2023 19:22

AndyAyersMS mentioned this pull request Nov 10, 2023

JIT: use reverse post-order (RPO) traversal for morph #93246

Closed

12 tasks

one more place we can trim

631a42f

This was referenced Nov 10, 2023

Test_EventSource_EtwManifestGeneration* tests failing in CI #48798

Closed

System.Security.Cryptography.Tests timing out #93840

Closed

AndyAyersMS changed the title ~~Search less assertion prop~~ JIT: Improve local assertion prop throughput Nov 11, 2023

disable

6d4960a

AndyAyersMS marked this pull request as ready for review November 11, 2023 01:59

jakobbotsch reviewed Nov 11, 2023

View reviewed changes

AndyAyersMS added 3 commits November 11, 2023 08:19

review feedback

b1981f1

fix test setting

9ccd6ac

fix typo

eb544c1

fix reachability check for OSR

17eb3b6

build-analysis bot mentioned this pull request Nov 12, 2023

[wasm] Runtime tests' build gets terminated with exited with code 137 #94077

Closed

jakobbotsch reviewed Nov 13, 2023

View reviewed changes

jakobbotsch approved these changes Nov 13, 2023

View reviewed changes

AndyAyersMS mentioned this pull request Nov 13, 2023

On Stack Replacement Next Steps #33658

Open

72 tasks

AndyAyersMS merged commit 8a84b33 into dotnet:main Nov 13, 2023
136 of 139 checks passed

amanasifkhalid mentioned this pull request Nov 15, 2023

JIT: Remove BBJ_NONE #94239

Merged

cincuranet mentioned this pull request Nov 21, 2023

[Perf] Windows/x64: 1 Regression on 11/13/2023 10:58:18 PM dotnet/perf-autofiling-issues#24792

Open

github-actions bot locked and limited conversation to collaborators Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Improve local assertion prop throughput #94597

JIT: Improve local assertion prop throughput #94597

AndyAyersMS commented Nov 10, 2023 •

edited

Loading

ghost commented Nov 10, 2023

AndyAyersMS commented Nov 10, 2023

AndyAyersMS commented Nov 10, 2023 •

edited

Loading

AndyAyersMS commented Nov 10, 2023 •

edited

Loading

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

jakobbotsch Nov 11, 2023

jakobbotsch Nov 11, 2023

jakobbotsch Nov 11, 2023

AndyAyersMS Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 12, 2023

jakobbotsch Nov 13, 2023

jakobbotsch Nov 13, 2023

AndyAyersMS Nov 13, 2023

AndyAyersMS Nov 13, 2023

	if (curAssertion->Equals(newAssertion, !optLocalAssertionProp))
	if (curAssertion->Equals(newAssertion, /* vnBased */ false))

JIT: Improve local assertion prop throughput #94597

JIT: Improve local assertion prop throughput #94597

Conversation

AndyAyersMS commented Nov 10, 2023 • edited Loading

ghost commented Nov 10, 2023

AndyAyersMS commented Nov 10, 2023

AndyAyersMS commented Nov 10, 2023 • edited Loading

TP Impact on cross-block assertion prop

AndyAyersMS commented Nov 10, 2023 • edited Loading

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 11, 2023

AndyAyersMS commented Nov 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Nov 10, 2023 •

edited

Loading

AndyAyersMS commented Nov 10, 2023 •

edited

Loading

AndyAyersMS commented Nov 10, 2023 •

edited

Loading