-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) #248
Conversation
@Wraith2 this PR and the PR dotnet/corefx#38271 have same changes but the newly added tests are failing on Linux from this PR. Could you take a look? Manual Tests Log: log_15_16802.zip On Windows also the range tests are failing, with actual value 0.99. Logs: log_31_16801.zip |
The range failing sort of makes sense. There's clearly some difference in this repo to corefx to do with cancellation or #234 wouldn't have been needed. I think that PR blocks this PR from being effective because you can't get to this fix if the other blocks it and this is higher in the stack. So I'd advise queuing this one after the other and retesting once it's merged. For netfx the test was never intended for that build. A value of .99 means it doesn't exhibit the problem that this PR is attempting to fix though which is probably good. I think it also means the timing is slightly wonky since it's supposed to cancel after 1 second and less than a second has elapsed, we could probably just relax the lower bound because it's the upper bound that signals the error condition. Oh, and PlainCancelTestAsync isn't one that's changed in this PR. It looks from the log like it was coded to expect a specific piece of text in the error message not the type. |
There's now a lock that is optionally entered. I'm curious how that can possibly be safe? If entering the lock fails then execution simply continues with someone else holding the lock, or with no one holding the lock even. |
It's safe because there are only two ways to aquire it and one of them is the attention packet setup and send which doesn't mutate local state. If you're attempting to send the attention packet and the lock is already held you need to ignore the lock because even though something else is holding it you know you're not going to mutate and your very purpose is to cancel the thing that's holding the lock. If you're attempting to run and the lock is held then you need to wait for it. |
OK, so it is a requirement, that the lock is held but it does not matter by whom. Could it not happen that the lock is held, so taking the lock is skipped, and then immediately the lock is released by the code? That would allow execution without any lock held. |
Yes, The only order that this can happen in is that an execute can release the lock while an attention is also in flight and then a second execute can start, that shouldnt' cause a problem because the attention doesn't really mutate, though it will probably set the attention sent flag. Do you have a suggestion on a better way to approach this? using locks in code shared between sync and async executions is a bit dangerous in my opinion anyway. What would meet the requirements in this case is a counted mutex object that you can try enter based on the current value, execute could only enter if it is <1 and attention can only enter if it's <2, i don't think the exeisting mutexes support this behaviour. |
@Wraith2 I am totally unqualified to give advice on this 😄. I don't know how the codebase works. I have learned over the years that holding a lock for an unbounded amount of time (such as during IO) is usually an impossible design. Doing this locks out other access to the system/data (such as cancellation). My experience is that this creates functional problems that cannot be fully fixed under such a design. Often, a way out of this is to use a state machine approach. For example, track ongoing and queued operations in a data structure. Each access of those structures happens under a lock within mere microseconds (no blocking, extremely rare contention). When an IO/operation completes it must check those structures and possible update them or initiate other work. I hope this does not sound too vague. My understanding is that ADO.NET locks while doing IO. I have seen a number of problems on this issue tracker around cancellation, concurrency and locking. I got the impression that the concurrency design is not fundamentally correct. But this might be naive given that I am not familiar with the codebase. |
in general I agree but I'm not a position to totally rewrite the internals with lock free mechanisms at the moment. I'll put that on the list of things to consider. |
…t38271 # Conflicts: # src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIPacket.cs # src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObjectManaged.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are separate code paths for TCP vs named pipes connections, we should adjust the new tests to run for both scenarios.
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlCommand/SqlCommandCancelTest.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlCommand/SqlCommandCancelTest.cs
Outdated
Show resolved
Hide resolved
@cheenamalhotra Afaiu this issue exists also on 1.1.x version and that why on EF Core 3.1. Any plans to port this fix to there too? |
@olljanat You can just update to use 2.1.0 with EF Core 3.x: https://erikej.github.io/efcore/sqlclient/2020/03/22/update_mds.html |
Ports dotnet/corefx#38271 by @Wraith2 to fix #109
(The PR is copy of the original PR, so we can review these changes first here)