Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency bug fix - BuildManager instances acquire its own BuildTelemetry instance (#8444) #8561

Merged
merged 3 commits into from
Mar 14, 2023

Conversation

rokonec
Copy link
Contributor

@rokonec rokonec commented Mar 14, 2023

Summary
Original implementation did not expect multiple instances of BuildManager called concurrently. But in VS DTB and normal build are run concurrently.

This is backported from main PR #8444

Customer Impact
In rare cases Dictionary data structure is corrupted and can cause infinite loop. This affect only VS scenarios.
It is currently #7 ranked VS hang issue.

Regression?
Yes, introduced in VS 17.4.

Testing
Manual validation by @rokonec and automated tests. Additionally it has been in bleeding edge VS for about three weeks.

Risk
Low

Note
Has been already backported to 17.5
Contains fix for infra which disables nuget static graph restore (otherwise CI pipeline is failing)

…t#8444)

Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1708215

Context
In VS there are multiple instances of BuildManager called asynchronously. For DTB and normal build and maybe other which I have not identified yet.

Changes Made
BuildManager instances acquire its own BuildTelemetry instance as oppose to sharing single BuildTelemetry instance in non thread safe manner.

Testing
Locally
# Conflicts:
#	src/Build/BackEnd/Client/MSBuildClient.cs - resolved with minimal and safe approach
@rokonec rokonec requested a review from rainersigwald March 14, 2023 15:03
@rokonec rokonec changed the base branch from main to vs17.4 March 14, 2023 15:06
rokonec and others added 2 commits March 14, 2023 16:11
Our CI builds fails because of bug NuGet/Home#12373. 
It is fixed in NuGet/NuGet.Client#5010. 
We are waiting for it to flow to CI machines. Meanwhile this PR applies a workaround.

Note: This PR needs to be reverted once it happens.
@rokonec rokonec merged commit f08e881 into dotnet:vs17.4 Mar 14, 2023
@rokonec rokonec deleted the vs17.4 branch March 14, 2023 19:31
rokonec added a commit that referenced this pull request Mar 27, 2023
* Concurrency bug fix - BuildManager instances acquire its own BuildTelemetry instance (#8561)

* BuildManager instances acquire its own BuildTelemetry instance (#8444)

Fixes https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1708215

Context
In VS there are multiple instances of BuildManager called asynchronously. For DTB and normal build and maybe other which I have not identified yet.

Changes Made
BuildManager instances acquire its own BuildTelemetry instance as oppose to sharing single BuildTelemetry instance in non thread safe manner.

Testing
Locally
# Conflicts:
#	src/Build/BackEnd/Client/MSBuildClient.cs - resolved with minimal and safe approach

* Bumping version

* Turn off static graph restore. (#8498)

Our CI builds fails because of bug NuGet/Home#12373. 
It is fixed in NuGet/NuGet.Client#5010. 
We are waiting for it to flow to CI machines. Meanwhile this PR applies a workaround.

Note: This PR needs to be reverted once it happens.

---------

Co-authored-by: AR-May <67507805+AR-May@users.noreply.github.com>

* Use AutoResetEvent as oppose to ManualResetEventSlim (#8575)

Summary
Customer, mainly internal like XStore, with huge repos, using msbuild /graph /bl on powerful development and build computers, might experience 15x plus regression in evaluation time.

It has been identified as performance bug in our logging event pub/sub mechanism. When ingest queue reaches its bound, at .net472 ManualResetEventSlim causes way too many thread.Yields flooding the system with thread context switches.
This hypothesis has been verified by PerfMon perfcounter System.ContextSwitches.

Alhougt counterintuitive, AutoResetEvent , ManualResetEvent or even SpinLocking produced better behavior and with those the issue no longer reproduce.

Customer Impact
In case of XStore it was about 7 minutes in build time.

Regression?
Yes, introduced in VS 17.4.

Testing
Manual validation by @rokonec and automated tests. Using local repro to verify changes has fixed it.

Risk
Low

Note
It effect only VS MSBuild.exe. In dotnet build ManualResetEventSlim works better.

---------

Co-authored-by: Roman Konecny <rokonecn@microsoft.com>
Co-authored-by: AR-May <67507805+AR-May@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants