Do not call into MsQuic inside a lock #67037

rzikm · 2022-03-23T13:25:42Z

Also added Debug.Assert calls to ensure that any calls to MsQuic APIs inside a lock are caught in time.

Fixes #59345

Fixes dotnet#59345

ghost · 2022-03-23T13:25:51Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Also added Debug.Assert calls to ensure that any calls to MsQuic APIs inside a lock are caught in time.

Fixes #59345

Author:	rzikm
Assignees:	-
Labels:	`area-System.Net.Quic`
Milestone:	-

danmoseley · 2022-03-23T15:21:40Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

@@ -845,6 +863,7 @@ private void Dispose(bool disposing)

 private void EnableReceive()
 {
+ Debug.Assert(!Monitor.IsEntered(_state), "!Monitor.IsEntered(_state)");


Instead of repeating all this, it might be a little more readable to just call a method?

[Conditional("DEBUG")] internal void AssertMonitorNotEntered(object obj) { Debug.Assert(!Monitor.IsEntered(obj), "Monitor was unexpectedly held"); }

We're using asserts like this all over the place, including S.N.Http. So I'm fine with this as-is, it's still single line. What I do think is that the message is unnecessary here and we rarely include something like this.

(If we do #65965, it'll also be redundant in this case, as with or without the message you'd get the same assert.)

stephentoub · 2022-03-23T15:24:08Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

- return new ValueTask<int>(taken);
+ if (reenableReceive)
+ {
+ EnableReceive();


If we're calling this outside of the lock, is there something else that ensures we're not racing with another thread to disable receives?

Stephen's right. Just a hint here (I haven't thought this fully through), we're using the ReadState as a guard, we always change it only within the lock and than we use it to determine other actions that need to take place outside of the lock.
Possible races: ReadAsync may race with with msquic callback HandleEventRecv. And I don't remember if we have any guards to prevent parallel reads on the stream, I think we have an open issue for this somewhere...

cc @CarnaViire

TL;DR: RECV event is the only thing disabling receives. It seems to me that it should be ok due to additional guard from msquic side (no new RECV event will come until ReceiveComplete+EnableReceive is called). But I don't like how fragile it looks 😢

Note that the change is only in a branch for "IndividualReadComplete" state. It means it happens AFTER some data already arrived in RECV event.

We do have a guard for parallel reads which looks exactly at ReadState. So it will throw if ReadAsync is called while state is PendingRead (no data available and there's already a waiting read).

By moving ReceiveComplete+EnableReceive out of the lock, we allow a time where state is already changed from "IndividualReadComplete" to "None", but ongoing "first" ReceiveAsync function is not exited yet and ReceiveComplete+EnableReceive are not called yet (but all data is already copied). So a second ReceiveAsync might enter. It would see state "None" which would mean data is not available. If it grabs the lock before ReceiveComplete+EnableReceive are called (i.e. RECV event is still not possible), it would change the state to PendingRead (waiting for data), store the destination buffer reference, register cancellation and return a task to wait. All the things "None" branch touches are not touched in the remainder of the first ReceiveAsync (between exiting from the lock and returning new value task).

The bottom line is I don't think there's any problem in ReadAsync vs HandleEventRecv race. But we might want to rethink guard against parallel reads.

So if I understand correctly, the only think that can go wrong with this change is parallel ReadAsync operations, which user code is not supposed to do anyway...

Should I add the guards against parallel operations in this PR, or should we do that separately? @ManickaP @stephentoub , thoughts?

I would say leave it as is for now. We have a separate issue for proper guards against parallel reads and writes #52627

CarnaViire · 2022-03-25T13:30:07Z

/azp run runtime-libraries stress-http

azure-pipelines · 2022-03-25T13:30:16Z

Azure Pipelines successfully started running 1 pipeline(s).

ManickaP · 2022-03-28T09:10:17Z

There's a segmentation fault from the H/3 stress run, in artifacts is dump. Could you please investigate the cause? If it's not related to this change, we can merge, but we need to file an issue for it at least.

ManickaP · 2022-03-28T20:31:37Z

@rzikm #67230 seems to be getting closer to stable, so if you're not successful with local run, you might opt to wait for this to get in main.

rzikm · 2022-03-29T14:54:29Z

/azp run stress-http

azure-pipelines · 2022-03-29T14:54:34Z

No pipelines are associated with this pull request.

rzikm · 2022-03-29T14:54:46Z

/azp run runtime-libraries stress-http

azure-pipelines · 2022-03-29T14:54:56Z

Azure Pipelines successfully started running 1 pipeline(s).

rzikm · 2022-03-29T17:40:42Z

The second run of the http-stress did not crash, are we good to move forward, @ManickaP?

ManickaP · 2022-03-29T18:03:35Z

We can. We should observe the H/3 stress for a while after this change. We still must have some nefarious bug in our code 😢

ManickaP · 2022-03-29T18:06:47Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

@@ -149,7 +149,7 @@ public MsQuicConnection(IPEndPoint localEndPoint, IPEndPoint remoteEndPoint, MsQ

 try
 {
- Debug.Assert(!Monitor.IsEntered(_state));
+ Debug.Assert(!Monitor.IsEntered(_state), "!Monitor.IsEntered(_state)");


What led you to even expand more the messages in asserts after Stephen's comment?
It'll become redundant.

I thought it would be more useful to have a message handy when something crashes, I thought we generally tend to include the message with the assert in this repo

It's exactly the other way around, just grep for Debug.Assert in the repo. And if it contains the message, I mostly see some additional info and not just the copy of the condition.

ManickaP · 2022-03-29T18:15:11Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

+ }
+
+ // methods below need to be called outside of the lock
+ if (bytesRead > -1)


Can this ever be <= -1? We didn't have check like this before if I understand it correctly.

it will be -1 if initialReadState != ReadState.IndividualReadComplete, I added it in order for EnableReceive and ReceiveComplete to be called (outside of the lock) if and only if they would've been called in the previous version (inside of the lock)

I see now, it's its initial value set before the lock.

ManickaP

Just nits, otherwise good to merge, thanks!

nibanks · 2022-04-01T14:31:41Z

I know this is already merged, but you're totally allowed to call async MsQuic functions under lock. For instance, most (all?) datapath functions would be fine. Every function annotated with _IRQL_requires_max_(DISPATCH_LEVEL) should be Ok to call, because it's already designed to be called at DISPATCH level in kernel, which means it cannot be blocked by anything else inside MsQuic that would cause a deadlock.

wfurt · 2022-04-01T17:00:25Z

The problem we run into while back (and triggered) this issue was not inside MsQuic. There are operations where we hold lock and we could try to hold it again when the event shows on different thread.

Do not call into MsQuic inside a lock

67724bb

Fixes dotnet#59345

dotnet-issue-labeler bot added the area-System.Net.Quic label Mar 23, 2022

ghost assigned rzikm Mar 23, 2022

danmoseley reviewed Mar 23, 2022

View reviewed changes

stephentoub reviewed Mar 23, 2022

View reviewed changes

rzikm requested a review from ManickaP March 28, 2022 08:51

ManickaP reviewed Mar 29, 2022

View reviewed changes

ManickaP approved these changes Mar 29, 2022

View reviewed changes

rzikm merged commit 4bd27a6 into dotnet:main Mar 29, 2022

ManickaP mentioned this pull request Apr 1, 2022

[QUIC] Update to msquic 2 #67383

Merged

CarnaViire mentioned this pull request Apr 4, 2022

Handle concurrent reads and concurrent writes on MsQuicStream #67329

Merged

karelz added this to the 7.0.0 milestone Apr 8, 2022

ghost locked as resolved and limited conversation to collaborators May 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not call into MsQuic inside a lock #67037

Do not call into MsQuic inside a lock #67037

rzikm commented Mar 23, 2022

ghost commented Mar 23, 2022

danmoseley Mar 23, 2022

ManickaP Mar 23, 2022

stephentoub Mar 23, 2022

stephentoub Mar 23, 2022

ManickaP Mar 23, 2022

CarnaViire Mar 23, 2022

rzikm Mar 25, 2022

CarnaViire Mar 25, 2022

CarnaViire commented Mar 25, 2022

azure-pipelines bot commented Mar 25, 2022

ManickaP commented Mar 28, 2022

ManickaP commented Mar 28, 2022

rzikm commented Mar 29, 2022

azure-pipelines bot commented Mar 29, 2022

rzikm commented Mar 29, 2022

azure-pipelines bot commented Mar 29, 2022

rzikm commented Mar 29, 2022

ManickaP commented Mar 29, 2022

ManickaP Mar 29, 2022

rzikm Mar 29, 2022

ManickaP Mar 29, 2022

ManickaP Mar 29, 2022

rzikm Mar 29, 2022 •

edited

Loading

ManickaP Mar 29, 2022

ManickaP left a comment

nibanks commented Apr 1, 2022

wfurt commented Apr 1, 2022

Do not call into MsQuic inside a lock #67037

Do not call into MsQuic inside a lock #67037

Conversation

rzikm commented Mar 23, 2022

ghost commented Mar 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarnaViire commented Mar 25, 2022

azure-pipelines bot commented Mar 25, 2022

ManickaP commented Mar 28, 2022

ManickaP commented Mar 28, 2022

rzikm commented Mar 29, 2022

azure-pipelines bot commented Mar 29, 2022

rzikm commented Mar 29, 2022

azure-pipelines bot commented Mar 29, 2022

rzikm commented Mar 29, 2022

ManickaP commented Mar 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rzikm Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ManickaP left a comment

Choose a reason for hiding this comment

nibanks commented Apr 1, 2022

wfurt commented Apr 1, 2022

rzikm Mar 29, 2022 •

edited

Loading