-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workaround for AsyncSemaphore disposal problems #19043
Conversation
|
||
protected override void Dispose(bool disposing) | ||
{ | ||
// Do not call base.Dispose. We do not want the AsyncSemaphore instances to be disposed due to a race |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you have link here to either gitub or VSO bug that documents the issue, which we can give to the platform guys to look into fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 The link is a few lines above, in the comment for the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it okay to not dispose the async semaphore? tagging @AArnott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heejaechang Disposing AsyncSemaphore
is a NOP. It just transitions the object to either an unusable or an invalid state, but doesn't actually release any unmanaged resources of any kind.
What's up with the compile errors? Overall i'm fine with this change. I'd like like the code to point to the relevant bug that is tracking getting the underlying issue fixed. |
My overall concern is this: Somethign seems broken to me with how cancellation is being provided (either on our end, or in the underlying StreamRPC side). It seems fundamentally racey. For example, it sounds like the following set of steps are possible:
The part that seems wonky to me is '6'. If we have outstanding work that we're awaiting, then we shouldn't immediatley transition to the canceled state since the OOP work is still executing and may still call into us. This approach seems to violate the cooperative cancellation model. We're cancelling to tell everything "hey you should stop executing", but if execution hasn't actually stopped, we should not be tearing down things. Can someone verify if my set of steps above is possible? If so, then the design here seems suspect to me as it seems fundamentally 'racey'. If, however, the above is not possible, then it's unclear to me what the set of steps are that are causing the issue today. If it's another set of steps, please let me know what's happening so i can have an accurate mental model of how rpc+oop is actually working. Thanks! |
looked your change in StreamJson in vso. now I understand. change looks good. thank you for fixing this issue! |
looks like you need to use external alias or something to fix the test utility compiler errors due to type defined in 2 places. |
our cancellation is the disconnecting the stream. there is no tell the other side we are cancelling. we are prepared to deal with object disposed exception thrown due to stream being disposed. problem here is semaphoreSlim used in StreamJson throws a null exception rather than object disposed exception when it is used after disposed. our desire to use cancellation token at the session level rather than each invocation makes the cancellation model you described not possible. we could harden our side that whatever exception we get from StreamJson, if our cancellation is raised, we ignore the exception. I believe right now we only translate object disposed exception to cancellation exception. |
Oh ick. That's hugely unfortunate. That's not how we did things for teh TypeScript out of proc work. Cancellation was intended to remote over to the OOP side to tell the OOP side to stop what it was doin.
I don't see how that's the case. We could certainly have cancellation be session level rather than invocation level. The important part is that cancelling that token isn't a "disconnect the stream" operation, it's a "let me let the remote side know it should stop working" operation. -- Ok. So the way i see it so far is this: It is Roslyn that is doing some funky stuff. (Prematurely disposing). We can either:
Given what part of the release cycle we're in, it sounds like '2' is the better choice for now. Though i would prefer a long term plan of not overly aggressively just disconnecting, and praying that doesn't lead to problems further down the line (this sort of bug being a perfect example of that :)). |
remote side cancel things as soon as connection is closed. so, cancellation works as expected. sure, we could make some kind of our own mechanism workarouding what StreamJson provide. but I would rather use what StreamJson uses or keep do what we are doing. both of them is simpler. |
right now, I only translate object disposed exception to cancellation assumed that will be only exception that can be thrown when object is used after disposed. but we can change that to any exception. it is not we swallowing all exception since we only translate it to cancellation exception if the cancellation is in raised state. if remote call failed without us cancelled the call, it will still propagate and handled (in roslyn case, failfast) |
This is a workaround for dotnet#18563. See dotnet#19042 for a follow-up task to remove this workaround.
b78792c
to
d4a6431
Compare
Got called into a meeting forgot to push the fix. I included the file in one project not realizing it could see another one via IVT. |
The exception thrown is |
In that stack trace yes. A |
@MattGertz for approval. This is a workaround for a crasher @CyrusNajmabadi has hit a few times just this week. We're not sure where the final bug fix will live, but I've filed an issue to remove this change sometime in the future after we adopt the final approach. |
This is a workaround for #18563. See #19042 for a follow-up task to remove
this workaround.
Ask Mode
Customer scenario
Rare but definitely observable crash in OOP scenarios (Navigate To, etc.).
Bugs this fixes:
This works around #18563 prior to the proper fix being incorporated from microsoft/vs-threading#105.
Workarounds, if any
None.
Risk
Low. This change is intended to be functionally equivalent to microsoft/vs-threading#105, both before and after that change is incorporated.
Performance impact
Aside from a small additional type existing, none.
Is this a regression from a previous update?
No.
Root cause analysis:
Failure to strictly adhere to MSDN documentation for a concurrency primitive (
SemaphoreSlim
). This is an exceptionally hard to reproduce race condition bug in a dependency.How was the bug found?
Internal customer reported.