-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TaskRegistry CPUWatson issues due to false optimistic concurrency #8861
Fix TaskRegistry CPUWatson issues due to false optimistic concurrency #8861
Conversation
It looked good but didn't pass tests :)
Aaah - there is a hidden trap - we're relying on implicit ordering of the data in the Dictionary fields (e.g. here https://github.com/dotnet/msbuild/pull/8861/files#diff-9233ff64c785ef48b792931b3c5f0d709e741671c2c0bb2b7afb6c09aaaaeb81R626 ordering influences which task will be returned), which by itself feel very wrong, but it can (and does) break with replacing the backing field with ConcurrentDictionary, which has different internal ordering. The proper solution is to order the enumerations explicitly (probably by the order of registrations) - however since the previous implementation was lacking the ordering, this can lead to altered behavior perceived as regressions. @rainersigwald - thoughts on this? Can we introdce explicit ordering by order of registrations? |
Explicitly addressed here: https://github.com/dotnet/msbuild/pull/8861/files#diff-9233ff64c785ef48b792931b3c5f0d709e741671c2c0bb2b7afb6c09aaaaeb81R593 and here: https://github.com/dotnet/msbuild/pull/8861/files#diff-9233ff64c785ef48b792931b3c5f0d709e741671c2c0bb2b7afb6c09aaaaeb81R604 While this is very likely intended behavior (LIFO of the multiple matching tasks) - it may cause unexpected results in some cases - e.g.: this https://github.com/dotnet/msbuild/pull/8861/files#diff-0c7ff4eddab39e683e61e6f11011eac73dae76d2574999184b3c0e74f9c2fa10R44 was needed to fix the failing test (as wrong Although not sure about this comment: https://github.com/dotnet/msbuild/blob/main/src/Build.UnitTests/BackEnd/TaskRegistry_Tests.cs#LL836C33-L836C52 Do we want the conflicting UsingTasks to be LIFO or FIFO? I believe it's the former.. |
c2b1734
to
3d36b54
Compare
I was just writing a comment in a file and it disappeared from the diff. I'll paste it here because it may still be relevant. I was specifically looking at the way So multiple tasks are registered under the same name, but with different runtimes. You're saying that previously we relied on implicit ordering and it worked "by accident". How does one select the task to run in this scenario? Is there a way to say "I want to invoke the CLR2 version of this task"?
I'm still flabbergasted by this, apologies for the stupid questions. |
@JanKrivanek and I have synced offline and here's what we found:
|
d442f94
to
e12bec2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
However we are fixing symptoms and not the root cause.
TaskRegistry
was never supposed to be thread safety and whole object graph which is referencing it is not supposed to be thread safety as well.
Something is violating our non-thread-safe usage requirements. This shall not be fixed, IMO, by making some subset of data structures thread safe, as it has performance and complexity consequences, but by fixing concurrency usage of our non-thread-safe data structures. That being said, it is hard to guess if guilty code is in MSBuild or VS.
I have made experimental code changes to detect wrong concurrency usages, but I was not able to repro it locally in patched VS INT for Orchard SLN open and build scenario.
We can still decide to fix it this way, but this will hide other possible concurrency bugs caused by same root cause.
I am blocking this PR so we, core msbuild team @rainersigwald @ladipro @JanKrivanek @AR-May, can decide how to approach it.
/// <summary> | ||
/// Cache of tasks already found using exact matching, | ||
/// keyed by the task identity requested. | ||
/// </summary> | ||
private Dictionary<RegisteredTaskIdentity, RegisteredTaskRecord> _cachedTaskRecordsWithExactMatch; | ||
private readonly Lazy<ConcurrentDictionary<RegisteredTaskIdentity, RegisteredTaskRecord>> _cachedTaskRecordsWithExactMatch = new(() => new(RegisteredTaskIdentity.RegisteredTaskIdentityComparer.Exact)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discusion with @rokonec - those fields might probably be fine to just initialize greedily (low chance of them staying noninitialized after usage of TaskRegistry)
tl;dr: There is a possibility of race between public API build and evaluation calls. And I agree there might be other ways to fix this. More details: The TaskRegistry is shared via the Build is more-or-less guarded not to be concurrent ( |
Each evaluation creates new I have found two suspicious places which are indeed wrong, and might be the root cause of observed issues: msbuild/src/Build/Instance/ProjectInstance.cs Line 564 in 402af3b
msbuild/src/Build/Instance/ProjectInstance.cs Lines 644 to 645 in 402af3b
This needs to be fixed. We have to somehow create deep clone of |
Discussed this offline with @rokonec - a new PR will be created (we need better isolation guarantees way up the stack in ProjectInstance) |
superseded by: #8973 |
Fixes ADO#1801351 and ADO#1801341
Context
Various
TaskRegistry
methods were reported to have infinite loops caused by corrupted dictionary state.Changes Made
Replacing the flagged Dictionaries and as well other ones on similar execution paths with ConcurrentDictionaries, plus where appropriate guarded access to inner structures as well (where reads are expected only after all writes - simple lock was added)