Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) #248

Merged
merged 8 commits into from
Jan 8, 2020

Conversation

cheenamalhotra
Copy link
Member

@cheenamalhotra cheenamalhotra commented Oct 8, 2019

Ports dotnet/corefx#38271 by @Wraith2 to fix #109

(The PR is copy of the original PR, so we can review these changes first here)

@cheenamalhotra
Copy link
Member Author

cheenamalhotra commented Oct 8, 2019

@Wraith2 this PR and the PR dotnet/corefx#38271 have same changes but the newly added tests are failing on Linux from this PR. Could you take a look?

Manual Tests Log: log_15_16802.zip

On Windows also the range tests are failing, with actual value 0.99. Logs: log_31_16801.zip

@cheenamalhotra cheenamalhotra changed the title Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) WIP Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) Oct 8, 2019
@Wraith2
Copy link
Contributor

Wraith2 commented Oct 8, 2019

The range failing sort of makes sense. There's clearly some difference in this repo to corefx to do with cancellation or #234 wouldn't have been needed. I think that PR blocks this PR from being effective because you can't get to this fix if the other blocks it and this is higher in the stack. So I'd advise queuing this one after the other and retesting once it's merged.

For netfx the test was never intended for that build. A value of .99 means it doesn't exhibit the problem that this PR is attempting to fix though which is probably good. I think it also means the timing is slightly wonky since it's supposed to cancel after 1 second and less than a second has elapsed, we could probably just relax the lower bound because it's the upper bound that signals the error condition.

Oh, and PlainCancelTestAsync isn't one that's changed in this PR. It looks from the log like it was coded to expect a specific piece of text in the error message not the type.

@GSPP
Copy link

GSPP commented Dec 17, 2019

There's now a lock that is optionally entered. I'm curious how that can possibly be safe? If entering the lock fails then execution simply continues with someone else holding the lock, or with no one holding the lock even.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 17, 2019

It's safe because there are only two ways to aquire it and one of them is the attention packet setup and send which doesn't mutate local state. If you're attempting to send the attention packet and the lock is already held you need to ignore the lock because even though something else is holding it you know you're not going to mutate and your very purpose is to cancel the thing that's holding the lock. If you're attempting to run and the lock is held then you need to wait for it.

@GSPP
Copy link

GSPP commented Dec 18, 2019

OK, so it is a requirement, that the lock is held but it does not matter by whom. Could it not happen that the lock is held, so taking the lock is skipped, and then immediately the lock is released by the code? That would allow execution without any lock held.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 18, 2019

Yes, The only order that this can happen in is that an execute can release the lock while an attention is also in flight and then a second execute can start, that shouldnt' cause a problem because the attention doesn't really mutate, though it will probably set the attention sent flag.

Do you have a suggestion on a better way to approach this? using locks in code shared between sync and async executions is a bit dangerous in my opinion anyway. What would meet the requirements in this case is a counted mutex object that you can try enter based on the current value, execute could only enter if it is <1 and attention can only enter if it's <2, i don't think the exeisting mutexes support this behaviour.

@GSPP
Copy link

GSPP commented Dec 21, 2019

@Wraith2 I am totally unqualified to give advice on this 😄. I don't know how the codebase works.

I have learned over the years that holding a lock for an unbounded amount of time (such as during IO) is usually an impossible design. Doing this locks out other access to the system/data (such as cancellation). My experience is that this creates functional problems that cannot be fully fixed under such a design.

Often, a way out of this is to use a state machine approach. For example, track ongoing and queued operations in a data structure. Each access of those structures happens under a lock within mere microseconds (no blocking, extremely rare contention). When an IO/operation completes it must check those structures and possible update them or initiate other work. I hope this does not sound too vague.

My understanding is that ADO.NET locks while doing IO. I have seen a number of problems on this issue tracker around cancellation, concurrency and locking. I got the impression that the concurrency design is not fundamentally correct. But this might be naive given that I am not familiar with the codebase.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 21, 2019

in general I agree but I'm not a position to totally rewrite the internals with lock free mechanisms at the moment. I'll put that on the list of things to consider.

…t38271

# Conflicts:
#	src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIPacket.cs
#	src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObjectManaged.cs
@cheenamalhotra cheenamalhotra changed the title WIP Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) Port CoreFx PR 38271: Fix Statement Command Cancellation (Managed SNI) Jan 7, 2020
Copy link
Contributor

@David-Engel David-Engel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are separate code paths for TCP vs named pipes connections, we should adjust the new tests to run for both scenarios.

@cheenamalhotra cheenamalhotra added this to the 2.0.0-preview1 milestone Jan 8, 2020
@cheenamalhotra cheenamalhotra merged commit a700c9b into dotnet:master Jan 8, 2020
@olljanat
Copy link

@cheenamalhotra Afaiu this issue exists also on 1.1.x version and that why on EF Core 3.1. Any plans to port this fix to there too?

@ErikEJ
Copy link
Contributor

ErikEJ commented Nov 24, 2020

@olljanat You can just update to use 2.1.0 with EF Core 3.x: https://erikej.github.io/efcore/sqlclient/2020/03/22/update_mds.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SqlCommand.Cancel() does not work on Linux
6 participants