-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't ignore the case when dataflow doesn't know what's going on #101031
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
I assume it also fixes #101010. |
Can we unit test this? |
@@ -656,6 +656,12 @@ public override bool HandleCall(MethodIL callingMethodBody, MethodDesc calledMet | |||
if (Intrinsics.GetIntrinsicIdForMethod(callingMethodDefinition) == IntrinsicId.RuntimeReflectionExtensions_GetMethodInfo) | |||
break; | |||
|
|||
if (param.IsEmpty()) | |||
{ | |||
// The static value is unknown and the below `foreach` won't execute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the value unknown in my repro?
I would expect that the dataflow should be able to trace through new Action(Test<string>).Method
without problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Known limitation of the dataflow analysis (#93720 is the tracking issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought #93720 was about not tracking variable types for GetType
- how does that explain this issue? If we don't trace through new Action(Test<string>)
, I would have expected it to show up as an unknown value, not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought #93720 was about not tracking variable types for
GetType
- how does that explain this issue? If we don't trace throughnew Action(Test<string>)
, I would have expected it to show up as an unknown value, not empty.
I thought unknown/empty are interchangeable but now I understand the difference. We eventually want to know the type allocated through the new Action
so that GetType
(or in this case Delegate.get_Method
) know the type it's operating on.
In this case, we end up with empty instead of unknown because constructors take this path (they "return void"):
runtime/src/tools/illink/src/ILLink.Shared/TrimAnalysis/HandleCallAction.cs
Lines 1168 to 1170 in 3d0da2c
// For now, if the intrinsic doesn't set a return value, fall back on the annotations. | |
// Note that this will be DynamicallyAccessedMembers.None for the intrinsics which don't return types. | |
returnValue ??= calledMethod.ReturnsVoid () ? MultiValueLattice.Top : annotatedMethodReturnValue; |
And we can no longer bash it to unknown here because the HandleCallAction returned true (i.e. handledFunction
):
runtime/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Dataflow/MethodBodyScanner.cs
Lines 1302 to 1319 in 3d0da2c
// Handle the return value or newobj result | |
if (!handledFunction) | |
{ | |
if (isNewObj) | |
{ | |
if (newObjValue == null) | |
methodReturnValue = UnknownValue.Instance; | |
else | |
methodReturnValue = newObjValue; | |
} | |
else | |
{ | |
if (!calledMethod.Signature.ReturnType.IsVoid) | |
{ | |
methodReturnValue = UnknownValue.Instance; | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, that looks like something we should fix (not that it'll help with the issue this PR is addressing). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We eventually want to know the type allocated through the new Action so that GetType (or in this case Delegate.get_Method) know the type it's operating on.
To fix tracking of new Action(Test<string>)
, I think we'd need to introduce a new dataflow value for Action
that tracks the ldftn
result, and handle that case in the get_Method
intrinsic. It seems like just knowing that the input has type Action
doesn't solve the problem, so I'm not sure #93720 is the right tracking issue. I think that one is more specific to GetType
handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to know what method the Action points to, just the concrete type the Delegate.get_Method method is called on (System.Action in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something - why don't we need to know the method it points to? Is it because we keep metadata for all delegate targets if there's any call to get_Method? I was assuming we'd ideally only keep metadata for those delegate targets that had a get_Method call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler keeps track of delegate type + method. If we know delegate type, we know all the methods it can be used with.
We don't track this at method granularity because doing new Action(Foo).Method
is very rare. People only do this if they want to work around lack of methodof
in C#. What happens 95% of time is that we construct a delegate somewhere and a different part of the program will call Delegate.get_Method
on it. That's why we only care about delegate type.
Dataflow losing track of the type in the trivial new Action(Foo).Method
pattern is throwing a wrench in this because if someone uses this to work around methodof
, we only see Delegate.get_Method
was called on some unknown thing and we need to disable the optimization globally because this could be any delegate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you!
The test would be testing a scenario that only exists due to a dataflow issue we keep running into. Sven just ran into it last week in #100786. Once we fix the underlying issue, we'd have to delete the test because this will no longer be a problem. We also don't have facilities to unit test this. This would have to be an E2E test and because it's testing "specific pattern inhibits a global optimization in the whole program", it would have to be a brand new test project because this can't be folded into any of the existing ones. All in all, felt like too much hassle to write a test with no longer term benefit. We do have tests for "global optimization is active" and "global optimization is disabled", just not for this specific pattern that disables it. I'm also a bit confused why this pattern exists in the first place and why dataflow has a distinction between "no known value ( |
Empty is supposed to mean "we know that this abstract value cannot represent any real values", for example to model the return value of a method that is known to throw at runtime, whereas unknown means "it could be anything". |
Can we add a test for this? Looks like it broke in dotnet/aspnetcore. dotnet/aspnetcore#55010 broke because of this. It might be good to have a "Native AOT" test that uses DI like this test in dotnet/aspnetcore: |
I already wrote my opinion on testing this in #101031 (comment). We're running into two issues in dataflow analysis (modeling something that should be Unknown as Empty, and not being able to see through Given this is now blocking codeflow we either need to approve and merge this ASAP, or do a revert of the previous PR. @sbomer does this look good? I'm off to bed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this change for now to unblock dependency flow. I'd like to fix the issue where this is not tracked as "unknown", but I can look into that as a follow-up.
My intention on testing was more of the end-to-end scenario that broke. So we wouldn't "have to delete the test because this will no longer be a problem.".
We do have the facilities to test the scenario though. See Lines 5 to 6 in 69062fd
Lines 45 to 49 in 69062fd
We can write a Native AOT app just like the above for the ActivatorUtilities class that was broken in #101010. We can basically just use the repro provided in the bug as the test app - all of the referenced code is built in dotnet/runtime. That way we can get a little more "end-to-end" tests here. |
Merging to unblock codeflow |
/ba-g ignoring unrelated timeout |
As general policy, I think every regression should have a regression test. Obviously if it's difficult or impossible to construct such a test, never mind. But if it's straightforward, there's by definition at least one incidence of the bug actually making into the product, which is enough in my opinion to merit the test. We run many thousands of tests on every PR and many of those have never, and will never, catch a bug in production. I'd rather invest in infrastructure improvements in test execution rather than avoid writing tests. We also accidentally revert PRs on occasion and even though we sometimes revert the tests as well, any extra coverage to avoid that is very useful at scale. |
TL;DR: Any test I can think of writing for this will stop exercising the codepath added here by the end of 9.0, but will forever cost us seconds in CI time because it will have to be the most expensive test kind we have. You'll have a hard time finding a PR of mine that wouldn't have a test. I do write test for pretty much any code I write. Writing a good and long term useful test exercising the code path I'm adding here is hard, if not impossible. I actually don't know if this codepath is going to be hittable at all once Sven fixes the Empty/Unknown confusion in dataflow; we might want to just revert this PR then.
This test would not be long term useful. The line that is throwing the exception initializes a field in the cctor that is not even used in native AOT - we set it but never read it. Ironically, the only observable effect of the line is that it needs to disable whole program analysis of delegates due to dataflow analysis deficiencies. It is not the only reason why ASP.NET gets delegate whole program analysis disabled (the other reason is #96528) but once/if it becomes the only reason to disable the optimization in ASP.NET, I'm going to submit another PR that takes it out of statically reachable code on native AOT (I considered doing that in #100916 already and wrote it as an option in PR description). One way or the other, the problematic line is not going to trigger the
We spend massive amount of time as a team over many years limiting the number of tests we run. We audited and deleted redundant tests between src/tests and src/libraries. We spent (and are spending) tons of time limiting the number of standalone tests in src/tests. This can only be tested by adding another fully standalone test (the most expensive test kind we have) that takes seconds to build and will become redundant the moment we fix any of these:
1 sounds like a 9.0 bug. 2 would be very nice to fix in 9.0 because we (and our customers - this is customer-reported) keep running into it. |
End-to-end tests don't ever really become redundant unless you delete the public APIs they depend on, because they test a whole bunch of things for user scenarios without depending on implementation details. A highly-targeted "unit" test? Sure, that can become redundant as refactoring happens. The repro is an example of user code (disregard that the delegate did nothing, that was to create a minimal repro to make the specific issue easy to diagnose and fix) that should work regardless of what implementation details it hits or what internals become redundant over time. If it were made a bit less minimal (say, the delegate set the base address for the requests, as it did in the original app it's derived from) and checked a few extra cases then you'd get more bang-for-your-buck out of it for the time spent compiling it. In the extreme case you could even compile (most of) the whole app I found #101010 in originally and cover even more user scenarios with that one compilation. |
I meant overall stats before the previous PR and after this PR; now we are keeping more symbols Sounds like still a win, was just curious if we re-measured "This makes ASP.NET 50 kB smaller with native AOT".
There are end-to-end tests for AOT which run in PRs of SDK / installer repos. Should we lower the bar of adding tests in those repos which were deemed expensive-for-runtime? It slipped though the cracks and made it to a preview release due to a test gap. Murphy's law says it can happen again. |
The test needs to test exactly this code path and nothing else. To hit this bug, the program needs to have a call to On the other hand if we keep the test small and razor sharp testing this, it will stop testing the bug by the end of 9.0 (it's not a question of "if", only "when") and it's questionable what other residual value it will have due to it being focused on one thing only. The other E2E tests we have in this repo are testing trimming warning suppressions and they will be useful as long as the suppressions exist.
ASP.NET doesn't benefit from this optimization yet due to lack of #96528 so I expect the impact of this PR to be zero. (ASP.NET sets EventSourceSupport=true that will paper over this bug because it brings more reasons to disable the optimization.)
Did it? It was hit as ASP.NET core repo was ingesting the runtime build with this. |
It certainly made its way all the way to be merged into dotnet/sdk, otherwise I wouldn't have found it. |
Note that the test in the ASP.NET Core repo that caught this bug wasn't for this exact codepath, but still caught the bug anyway. The test that caught it is named Adding a new end-to-end functional test for the scenario that broke here will be long term useful because it will ensure the scenario keeps working as we make changes in both in the nativeAOT runtime and in the ActivatorUtilities class. Not all tests need to be unit tests. Not all tests need to be end-to-end functional tests. We can use both to get the best coverage. Yes, these end-to-end functional tests are expensive - so let's not write hundreds of them. But a handful of them, especially when the scenario they cover is critical to the .NET stack (and has been broken in the past), is important to ensure our changes don't introduce breaks. Even if the codepath being fixed here will be deleted by the end of 9.0, the scenario that broke isn't going anywhere. Knowing that we will change the codepath makes it even more valuable to test the scenario. The test will ensure we don't break this scenario again when we do make changes. |
We have tiered validation system. We do accept that there are going to be bugs that won't get caught by dotnet/runtime pre-checkin tests and that will be only caught by dependent repos or outer loop validation. We do have https://github.com/dotnet/runtime/tree/main/src/libraries/Microsoft.Extensions.DependencyInjection/tests/TrimmingTests that is basic end-to-end test that exercises DI and ActivatorUtilities in a simple app. Does this test run in native AOT outer loop runs? If not, it is something to fix. The question to ask before adding new end-to-end tests: How many breaks would have been caught by the new end-to-end tests in last year? There is dotnet/runtime bug caught by dependent repos every other week. It is a different end-to-end scenario and different corner case each time. We cannot afford to add a new end-to-end test for each of these. |
No, those tests only run with Lines 399 to 400 in 6f3b1a6
|
I do agree with this statement. I'm not super concerned with where the test that covers regressions lives, I'm more concerned with the fact that it exists. If it's too expensive to run in PR, I'm fine with it running with outerloop. I'm even fine with it living in another repo (as with the SDK tests) as long as we can be confident that test actually exercises the core scenario that regressed. |
#101026 (comment) repros in 9.0 p3 and does not repro with 8.0. So no test in any repo has caught this regression in a while. |
For the purposes of delegate optimization that I'm changing here, this: class Program
{
static void Main() => Console.WriteLine(new Action(Test<string>).Method);
static void Test<T>() { }
} Is no different from this: class Program
{
static Action GetTest() => new Action(Test<string>);
static void Main() => Console.WriteLine(GetTest().Method);
static void Test<T>() { }
} Or this: class Program
{
static Action TheTest => new Action(Test<string>);
static void Main() => Console.WriteLine(TheTest.Method);
static void Test<T>() { }
} The latter two do not run into this bug. I could come up with many more that do not run into the bug. The delegate optimization does not concern itself with these shapes. Dataflow analysis does. We have delegate optimization tests that tests one of these shapes. We're not going to add E2E standalone tests for all of these variations that take seconds to build to test all possible variations. It is the job of the dataflow analysis to test these variations in it's unit tests that take milliseconds to run and are the place where dataflow analysis is tested. I'm not fixing the dataflow analysis bug, merely working around it. The fix in the data flow analysis is tracked in #101102. Once that's fixed, my workaround in this PR won't even be needed, it will just cover some corner case I don't even know how to hit (I'll leave it to Sven whether we should just roll back this PR then). We cannot unit test the delegate optimization. We can unit test dataflow. We do not test dataflow with delegate optimization tests. We do not test dataflow analysis with DI tests. We test it with dataflow tests. |
Instead of tracking the return value as "TopValue" or "unknown", this models the constructor as returning a value with a static type when called with newobj, letting us undo the workaround from #101031.
Instead of tracking the return value as "TopValue" or "unknown", this models the constructor as returning a value with a static type when called with newobj, letting us undo the workaround from dotnet#101031.
Fixes #101026.
TIL that when dataflow doesn't know something, it will model it as no value at all.