Refactor subrange assertion generation and propagation #55186

SingleAccretion · 2021-07-05T19:47:22Z

This PR adds a new abstraction for dealing with casts, IntegralRange, a tuple of two symbolic values that represent the lower and upper bounds for an integral value. The symbolic representation has been chosen so that the range can be efficiently used on 32 hosts, since the alternative (using 64 bit integers) would have been, I feel, unacceptably inefficient. Notably, the range always interprets values produces by IR nodes as signed (since we're already using the signed semantic in a number of places, and it is easier to reason about for the purposes of casts as using unsigned representation would have meant a discontinuous range).

Because assertion propagation was using integral ranges already, that's where IntegralRange is used in this PR. This enables reasoning about long checked casts on 32 bit hosts (rare scenario) and ensures the correctness of how the assertions are interpreted, since they're created for both the cast's node output (in local assertion propagation) and input (in global assertion propagation). It also allows for, in the future, shrinking the size of the AssertionDsc structure. I note that since we're allocating a pretty sizeable array of them upfront in global assertion propagation (3K+ bytes), any savings in the size here translate into large percentages in memory allocation reduction (e. g., removing 8 bytes from the AssertionDsc, I've observed -2.2% reduction, and I think we can reduce the size by at least 20 bytes), though this is somewhat misleading as the memory is only actually used for active assertions. Note that this PR does not alter the size of AssertionDsc, the added if !defined(HOST_64BIT) is for clarity only.

Naturally, code propagating the assertions for casts has also been refactored, mostly to avoid retyping a long local to int if it happened to have a range of int. Such retyping is redundant with what optNarrowTree does already, and is in fact incorrect in global assertion propagation as it did not update the value number properly (leaving a node that has TYP_INT with a VN of TYP_LONG). Also, the retyping has been restricted to the known case where it is needed (and a comment added on why it is needed).

An interesting property of the old code was how it treated TYP_BOOL. TYP_BOOL in the compiler is pretty much a TYP_UBYTE, but with a hint that it comes from CORINFO_TYPE_BOOL. Assertion generation treated assignments with a TYP_BOOL source as those restricting the destination to a [0..1] range, however, that is incorrect as per the IL semantics of booleans being treated as equivalent to uint8s. This did not have ill effects except for perhaps dropping a checked cast once in a while, but it was an internal inconsistency in the IR. This PR just makes TYP_BOOLs equivalent to TYP_UBYTEs for the purposes of assertion generation.

(Note: the text below is an explanation for the quirk)

Another interesting property of the old code was that it did not propagate the assertion for casts to TYP_BOOL (perhaps hinting at the reason for why it was "safe" to give TYP_BOOL [0..1] ranges in the first place...). The new code does, by virtue of not special casing it, and that is where the bulk of the diffs (win-x64, win-x86) come from.

Unfortunately, there are, though a small number of, regressions with this handling of TYP_BOOL. There are two sources of them.

First, there is the VN problem, it accounts for the vast majority of the regressions (missing CSEs + missed redundant branch elimination). Say we have the following situation:

int Function(short a)
{
    a = (sbyte)a;
    
    if (a is 1)
        return 1;

    return a;
}

Today, local assertion propagation removes the normalizing cast for a is 1. Then in VN we value number the local without the cast and the normalizing cast for the same local differently:

***** BB01, STMT00002(after)
N004 (  6,  6) [000009] ------------              *  JTRUE     void  
N003 (  4,  4) [000008] N------N-U--              \--*  NE        int    $102
N001 (  2,  2) [000006] ------------                 +--*  LCL_VAR   short  V00 arg0         u:2 $101
N002 (  1,  1) [000007] ------------                 \--*  CNS_INT   int    1 $44


***** BB03, STMT00003(after)
N003 (  4,  5) [000011] ------------              *  RETURN    int    $140
N002 (  3,  4) [000016] ------------              \--*  CAST      int <- short <- int $103
N001 (  2,  2) [000010] ------------                 \--*  LCL_VAR   int    V00 arg0         u:2 (last use) $101

The core issue here is that the VNs for "normalize-on-load" locals are fundamentally "wrong". In the example above, both [000008] and [000010] have the same VN, but say V00's home was on the stack. The first tree would have produced 2 bytes of the value it was assigned, sign-extended, while the second would produce a full int, with the upper 2 bytes set to whatever happened to be next to V00 on the stack (this is how it actually works - the width of the local node signifies to the codegen how wide the load should be). Because of this representation problem, fixing this appears to be not so straightforward.

One possible fix is to recognize, in VN, when a cast a redundant. That runs into regressions with "false CSEs", since we leave the cast tree intact, while in actuality it is redundant and CSEing such a tree has negative profitability (but CSE cannot know that because we leave the costs intact as well).

Another possible fix is to turn these casts into no-ops while renaming the uses (ideally we'd be doing that in early prop, but that's much more expensive TP-wise). That runs into regressions because, sometimes, removing a cast inhibits redundant branch optimization. Consider the following:

// Say p0 is a `short` parameter.
if (p0 != 0) // here, the normalizing cast cannot be removed.
{
    p1 = p0;
   
    if (p1 != 0) // here, the normalizing cast can be removed.
    {
    }
}

// In this example, the two conditions will have different VNs if we remove the cast above p1.

I have spent a considerable amount of time trying out different approaches to fixing the above issues, but they all ran into various regressions, so I tabled fixing it in this PR.

The second source of regressions is the fact that in lowering we can recognize EQ(CAST(bool/ubyte <- int), SMALL_CONST) and retype the tree to UBYTE, leading to, effectively, containment for the cast. Expanding this to consider small LCL_VARs is not so trivial as well, because it can regress when we have those locals assigned RSI, RDI, RBP or RSP, which take up more space to encode in their byte form on Amd64, not to mention we can end up with byte-able register constraints on x86.

Due to the above issues and the fact that I was not able to mitigate them, I will be quirking the handling of TYP_BOOL in this PR to get to a near zero substantial regressions state.

Fixes #54842, but properly.
Fixes #12936

kunalspathak · 2021-07-13T17:58:47Z

@SingleAccretion - Is it possible to narrow this PR to just the fix for #54842 vs. improvement? We will take the fix for .NET6 and improvement can be merged in future milestone.

SingleAccretion · 2021-07-13T18:23:22Z

I will see what can be done.

SingleAccretion · 2021-08-08T21:34:50Z

With the quirk in place, this PR is ready for review.

Updated diffs: win-x64, win-x86, win-arm64, linux-arm.

Now almost all are improvements, with the TYP_BOOL handling accounting for the vast majority of them, and the fact that we do not generate useless subrange assertions for just a couple overly large methods in coreclr_tests.

The only regressions (including the rather large NormalizeProtoMember one on x86...) are register allocation + frame zeroing differences, so, while the "actual" diff may be this:

-       movzx    rdx, sil
-       mov      r8, gword ptr [rbp-18H]
+       mov      r8, gword ptr [rbp-20H]

LSRA now decides to use a callee-saved register, or the prolog zeroing code gets a bit larger while the frame is smaller, etc. This can also happen due to new copy propagations.

EgorBo · 2021-09-17T17:14:07Z

cc @dotnet/jit-contrib

AndyAyersMS

I like this refactoring, but I'm confused on a few points.

AndyAyersMS · 2021-09-17T18:35:21Z

src/coreclr/jit/assertionprop.cpp

+ {
+ switch (toType)
+ {
+ // CAST_OVF(uint <- uint) - [INT_MIN..INT_MAX]


Can you explain this case a bit more?

AndyAyersMS · 2021-09-17T18:42:10Z

src/coreclr/jit/assertionprop.cpp

 {
 return nullptr;
 }

 // Skip over a GT_COMMA node(s), if necessary to get to the lcl.
 GenTree* lcl = op1;
- while (lcl->gtOper == GT_COMMA)
+ while (lcl->OperIs(GT_COMMA))


Might as well use gtEffectiveVal here, since you're changing this bit.

AndyAyersMS · 2021-09-17T18:49:26Z

src/coreclr/jit/compiler.h

@@ -1064,6 +1064,90 @@ class LclVarDsc

 }; // class LclVarDsc

+enum class SymbolicIntegerValue : int32_t


Seems like we might want ULongMax in here too, for completeness?

AndyAyersMS · 2021-09-17T18:56:21Z

src/coreclr/jit/assertionprop.cpp

+ upperBound = SymbolicIntegerValue::IntMax;
+ break;
+
+ // CAST_OVF(ulong <- uint) - [INT_MIN..INT_MAX]


Ditto here, why is this not [0..UINT_MAX] ?

SingleAccretion · 2021-09-17T19:07:40Z

Can you explain this case a bit more?

Seems like we might want ULongMax in here too, for completeness?

Ditto here, why is this not [0..UINT_MAX] ?

A very intentional part of the design here is that the IntegralRange is always interpreted as signed, it makes it easier to reason about consistently (in the time I've spent tweaking the cast handling, I've come to believe this is the best approach).

This makes ULongMax an impossible value, since we do not have Int128 integers.

This makes a cast like CAST_OVF(ulong <- uint) - [INT_MIN..INT_MAX] have the range it has:

For which values does the cast overflow?
No such values exist -> the full range of an integer is the "input" range.
The range is signed -> it is [INT_MIN..INT_MAX].

AndyAyersMS · 2021-09-17T19:30:02Z

A very intentional part of the design here is that the IntegralRange is always interpreted as signed

Ok, good. It is probably worth mentioning this in comments somewhere -- maybe in the header comment for IntegralRange.

Let me test my understanding. In ForCastInput we have just one case that uses a UINT symbolic bound

// CAST_OVF(uint <- ulong/long) - [0..UINT_MAX]

So we'd say -- yes this can overflow. Valid input range is indeed [0...UINT_MAX]. Interpreting this as signed still leaves [0...UINT_MAX].

AndyAyersMS · 2021-09-17T20:13:12Z

#58776 changed how address exposure is handled and is causing your builds to break.

SingleAccretion · 2021-09-17T20:17:50Z

changed how address exposure is handled and is causing your builds to break.

Yes, will be rebasing...

Ok, good. It is probably worth mentioning this in comments somewhere -- maybe in the header comment for IntegralRange.

Will write something more substantial than what is in the header of ForCastInput.

Move it into its own function. No diffs.

They are not useful: they do not tell us anything about the value. They can be harmful: code operating on the assertions can wrongly remove a checked cast because of an assertion for a non-checked one.

This change completely refactors the way subrange assertions are generated and consumed. It rationalizes and generalizes the model for subrange assertions which are generated and propagated for casts, makes AssertionDsc::Range into an actual abstraction, eliminates bugs related to wrong assertions generated for TYP_BOOL (without CQ loss), enables propagation for checked casts to non-small types both on 64 and 32 bit, all the while making the AssertionDsc structure smaller. There are quite a few moving parts, but at the heart of this change are following items: 1. Make ranges symbolic, i. e. made up of symbolic upper and lower bounds, represented by the new SymbolicIntegerValue enum class. This is necessary to not regress memory consumption and throughput of operations on 32 bit while handling long subranges. 2. Split the concept of "input" and "output" for a cast node. To properly handle checked cast and make the code "obviously correct", it is necessary to split the two values, one the cast node produces and one it consumes, apart. The old code only handles the "easy" case of the "input" and "output" ranges matching (such is the case for small types), the new code must be, and is, robust in reasoning about, e. g. CAST_OVF(uint <- ulong) vs CAST_OVF(int <- long). The ranges that represent these values are consistently in the "signed" format, making handling them easy. 3. Tweak the propagation code to not remove representation-changing casts, make it explicit and obvious in what cases (and why) it retypes the local under the cast. As a side-effect of the above changes, TYP_BOOL is now treated as TYP_UBYTE in propagation, as it should be. Actually, the old code didn't remove redundant casts to TYP_BOOL, the new code does, which is also where the bulk of the positive diffs come from. The old code also generated "useless" ranges like [INT_MIN..INT_MAX], the new code only generates ranges that could be used in subsequent propagation. Also, some miscellaneous cleanup, such as adding function headers, was performed.

In global assertion propagation, tentative assertions are created for casts, with the expectation that proving them via implication will allow for removing the cast. The old code did this through the common path with local assertions, however, this cannot be done the same way anymore because the two types of assertions are fundamnetally different. In local assertion propagation, subranges are derived from ASGs, where the cast is on the RHS and the local - LHS, so the "output" range for the cast must be used - local is assigned value the cast **produces**. In contrast, global subrange assertions say that the value the cast **consumes** is within the correct bounds for the cast to become a no-op or non-overflowing. Because of this fundamnetal distinction, I chose to make global assertions for casts in a new method instead of plumbing it through "optCreateAssertion". Note that it would be possible to do it via "optCreateAssertion" as well - global propagation always passes OAK_SUBRANGE to it, while local - OAK_EQUAL, but I decided against that option for the reason that it seems unintuitive to me that it would work the way it would. These tentative assertions are a very special case, so I think it is better to treat them that way. This leads to some minor code duplication, but I think that's ok. No diffs for this commit, though that's expected as "proving" the generated assertion is quite rare.

Make IntegralRange the owner of the logic for various methods relating to it and decouple it from AssertionDsc. IntegralRange will be useful in the future for other code that deals with cast removal, e. g. in fgMorphCast. Also some miscellaneous formatting fixups.

Using a static array results in smaller and more efficient code on all compilers.

This change is for completeness, since "fgMorphCast" decomposes casts from FP types to small types today and as such assertions are generated for them anyway. No diffs for this commit as expected.

To avoid regressions.

They did not handle "long" cases and as such were too conservative.

AndyAyersMS

Looks good.

Thanks again for your high-quality contributions.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 5, 2021

SingleAccretion marked this pull request as draft July 5, 2021 19:53

SingleAccretion marked this pull request as ready for review July 5, 2021 20:19

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch 4 times, most recently from d466b18 to 8e90830 Compare July 8, 2021 21:27

kunalspathak added this to the Future milestone Jul 13, 2021

SingleAccretion mentioned this pull request Jul 14, 2021

Surgical fix for bad assertion generation #55626

Merged

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch 2 times, most recently from 6d5746d to eb68aa8 Compare July 19, 2021 15:15

terrajobst added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2021

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch from eb68aa8 to 77a6951 Compare July 22, 2021 19:42

SingleAccretion marked this pull request as draft July 25, 2021 21:54

SingleAccretion changed the title ~~Improve subrange assertion generation and fix a bug~~ Refactor subrange assertion generation and propagation Jul 25, 2021

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch 7 times, most recently from 24a6b7c to e766cad Compare July 31, 2021 14:07

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch 4 times, most recently from 68fa9bd to 8f63b0a Compare August 4, 2021 18:13

SingleAccretion marked this pull request as ready for review August 8, 2021 21:36

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch from 73234b1 to 471d02a Compare August 26, 2021 21:54

AndyAyersMS reviewed Sep 17, 2021

View reviewed changes

SingleAccretion added 15 commits September 17, 2021 23:18

Refactor subrange generation

2ddaefa

Move it into its own function. No diffs.

Stop generating assertions for TYP_U/INT ranges

d6b7073

They are not useful: they do not tell us anything about the value. They can be harmful: code operating on the assertions can wrongly remove a checked cast because of an assertion for a non-checked one.

Add the header for optTryExtractSubrangeAssertion

0f24634

Change the implementation of SymbolicToRealValue

452a4a2

Using a static array results in smaller and more efficient code on all compilers.

FP & GC castees in IntegralRange::ForCastOutput

9b35f21

This change is for completeness, since "fgMorphCast" decomposes casts from FP types to small types today and as such assertions are generated for them anyway. No diffs for this commit as expected.

Add the TYP_BOOL quirk

9412c7e

To avoid regressions.

Fix the input ranges for non-overflowing casts

4b658dc

They did not handle "long" cases and as such were too conservative.

Add missing function headers

723da4d

Remove a TODO now that BOX import has been fixed

b43f07b

Use gtEffectiveVal & ChangeType to simplify code

a697d02

Fix build break

763a95f

Add a more detailed comment

c513b9a

SingleAccretion force-pushed the Improve-Subrange-Assertion-Generation branch from c7d8459 to c513b9a Compare September 17, 2021 20:51

Tweak the comment

995f0f8

AndyAyersMS approved these changes Sep 20, 2021

View reviewed changes

AndyAyersMS merged commit 422924a into dotnet:main Sep 20, 2021

SingleAccretion deleted the Improve-Subrange-Assertion-Generation branch September 21, 2021 23:16

EgorBo mentioned this pull request Sep 23, 2021

[Perf] Changes at 9/20/2021 7:51:11 PM dotnet/perf-autofiling-issues#1539

Closed

SingleAccretion mentioned this pull request Sep 28, 2021

Refactor fgMorphCast #59713

Merged

ghost locked as resolved and limited conversation to collaborators Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor subrange assertion generation and propagation #55186

Refactor subrange assertion generation and propagation #55186

SingleAccretion commented Jul 5, 2021 •

edited

Loading

kunalspathak commented Jul 13, 2021

SingleAccretion commented Jul 13, 2021

SingleAccretion commented Aug 8, 2021 •

edited

Loading

EgorBo commented Sep 17, 2021

AndyAyersMS left a comment

AndyAyersMS Sep 17, 2021

AndyAyersMS Sep 17, 2021

AndyAyersMS Sep 17, 2021

AndyAyersMS Sep 17, 2021

SingleAccretion commented Sep 17, 2021

AndyAyersMS commented Sep 17, 2021

AndyAyersMS commented Sep 17, 2021

SingleAccretion commented Sep 17, 2021

AndyAyersMS left a comment

		@@ -1064,6 +1064,90 @@ class LclVarDsc

		}; // class LclVarDsc

		enum class SymbolicIntegerValue : int32_t

Refactor subrange assertion generation and propagation #55186

Refactor subrange assertion generation and propagation #55186

Conversation

SingleAccretion commented Jul 5, 2021 • edited Loading

kunalspathak commented Jul 13, 2021

SingleAccretion commented Jul 13, 2021

SingleAccretion commented Aug 8, 2021 • edited Loading

EgorBo commented Sep 17, 2021

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS Sep 17, 2021

Choose a reason for hiding this comment

AndyAyersMS Sep 17, 2021

Choose a reason for hiding this comment

AndyAyersMS Sep 17, 2021

Choose a reason for hiding this comment

AndyAyersMS Sep 17, 2021

Choose a reason for hiding this comment

SingleAccretion commented Sep 17, 2021

AndyAyersMS commented Sep 17, 2021

AndyAyersMS commented Sep 17, 2021

SingleAccretion commented Sep 17, 2021

AndyAyersMS left a comment

Choose a reason for hiding this comment

SingleAccretion commented Jul 5, 2021 •

edited

Loading

SingleAccretion commented Aug 8, 2021 •

edited

Loading