-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 688 AsyncHelper.WaitForCompletion leaks unobserved exceptions #692
Conversation
src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlUtil.cs
Show resolved
Hide resolved
private void TimeOutATask() | ||
{ | ||
TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>(); | ||
AsyncHelper.WaitForCompletion(tcs.Task, 1); //Will time out as task uncompleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that this line causes the unit test to take a second to run - but that's the minimum timeout on WaitForCompletion
EventHandler<UnobservedTaskExceptionEventArgs> handleUnobservedException = | ||
(o, a) => { unobservedExceptionHappenedEvent.Set(); }; | ||
|
||
TaskScheduler.UnobservedTaskException += handleUnobservedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't much like messing with global state from a unit test, but I'm not sure of what other way to test this. I had a look at TaskExceptionHolder (which is what is responsible for raising the event when finalized) but it looks anything along those lines would be messing with the internals of the Task system too much.
So here's something exciting
This is not the unobserved exception that I'm creating, so some other unit test is leaking unobserved exceptions which my unit test is picking up on - would welcome advice on how to proceed. |
@@ -204,6 +206,7 @@ internal static void WaitForCompletion(Task task, int timeout, Action onTimeout | |||
} | |||
if (!task.IsCompleted) | |||
{ | |||
task.ContinueWith(_ => { }); //Ensure the task does not leave an unobserved exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify that this doesn't allocate a delegate and closure/state object on each invocation? I've done a lot of work to try and reduce allocations for async calls and I'd like to make sure things aren't regressed without good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't do - as there is nothing to capture in the closure - did this as an experiment to confirm:
private static Action<Task> GetLambdaNoClosure(int context)
{
return _ => { };
}
private static Action<Task> GetLambdaWithClosure(int context)
{
return _ => { context++; };
}
public static void Main()
{
var a = GetLambdaNoClosure(1);
var b = GetLambdaNoClosure(2);
Console.WriteLine(object.ReferenceEquals(a, b));
var c = GetLambdaWithClosure(1);
var d = GetLambdaWithClosure(2);
Console.WriteLine(object.ReferenceEquals(c, d));
}
Output:
True False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Correctness has priority before performance anyway so probably not is good enough for me.
@@ -90,14 +95,14 @@ public static void TestDbCommand() | |||
{ | |||
Fail = true | |||
}; | |||
commandFail.ExecuteNonQueryAsync().ContinueWith((t) => { }, TaskContinuationOptions.OnlyOnFaulted).Wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting TaskContinuationOptions.OnlyOnFaulted and not checking the exception leads to the exception being unobserved:
"If you do not access the Exception property, the exception is unhandled."
I also was not a fan of the fact that if the command did not fail, the test would hand indefinitely, so I swapped to the current approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I have found the source of these exceptions, and should be fixed in latest commit
Trying to go through the test failures now - there's a bunch of manual test failures which it's hard to see how I'd have caused, some failures relating to encryption which I have no idea about, and then the one failure that is definitely me is the test I've added failing on build configuration Ubuntu 16 AnyCPU,Release,netcoreapp,true - the same test does not fail on build configuration Ubuntu 16 AnyCPU,Release,netcoreapp,false - but I don't have any idea as to what that bool parameter is referring to - I don't have permission to view the pipeline, and couldn't glean any difference from viewing the log files - can someone help me out? |
Most the AE test runs fail for configuration reasons. I ignore them, i really hope the SQL team aren't relying on them. The only one i can find that's relevant to your PR is this: https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=15188&view=logs&j=9d906afe-3eb5-5d79-316d-86f3f3fa360c&t=52375b9b-c5db-514a-5e98-413583db2186
|
@@ -116,17 +121,17 @@ public static void TestDbCommand() | |||
source = new CancellationTokenSource(); | |||
Task.Factory.StartNew(() => { command.WaitForWaitingForCancel(); source.Cancel(); }); | |||
Task result = command.ExecuteNonQueryAsync(source.Token); | |||
Assert.True(result.IsFaulted, "Task result should be faulted"); | |||
Assert.True(result.Exception != null, "Task result should be faulted"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking IsFaulted does not count as observing an exception, accessing the Exception property does
The test suite got updated recently to make all the AE runs function. Can you merge or rebase to master and see if you get clean test runs? |
@@ -13,6 +13,11 @@ namespace Microsoft.Data.SqlClient.ManualTesting.Tests | |||
{ | |||
public static class BaseProviderAsyncTest | |||
{ | |||
private static void AssertTaskFaults(Task t) | |||
{ | |||
Assert.ThrowsAny<Exception>(() => t.Wait(TimeSpan.FromMilliseconds(1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you confirm if if providing timeout here is intentional?
Although it's a test, but it changes existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's intentional - note that t.Wait returns false in the case of timeout (doesn't throw a TimeoutException) so if the task under test times out the test will fail, as no exception will be thrown. I thought this was preferable to the previous behaviour where if the Task hung the test would hang indefinitely (or until killed by the framework)
In this particular case all the Tasks under test are already completed ones generated by Task.FromResult, so we can get away with a 1ms timeout and know nothing will timeout.
However this is somewhat a matter of taste, and I don't know the details of whether your test suite is set up to handle timing out tests well. So if you'd rather have a longer timeout or no timeout I'm happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original design for this is to verify that the unhandled exception can be propagated through tasks. We don't want the tests to fail due to this 1 ms timeout although it should not happen according to your explanation. I personally would prefer not setting timeout here.
src/Microsoft.Data.SqlClient/tests/FunctionalTests/SqlHelperTest.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
[Fact] | ||
public void WaitForCompletion_DoesNotCreateUnobservedException() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does need a bit of rework to handle randomness. It fails randomly in pipelines.
You'll notice if you run this test in loop, the second round does not pass ever. Any thoughts why can't we run this test in second round? It maybe a hint towards random errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just looking into this - it seems the fix I had written to WaitForCompletion wasn't actually a fix. I needed to make the continuation actually observe the exception, the behaviour I had lead to a race condition, where if the continuation hadn't been executed at the point we called GC.Collect, then there was still a reference to the original task and it wasn't Collected. Creating a false positive for me having fixed it.
I proved this by modifying WaitForCompletion to return the continuation, and waited for it to complete, at which point the error happened deterministically. Making the continuation observe the exception fixed the error. My bad, I misinterpreted the text here: https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskcontinuationoptions?view=netcore-3.1 saying "If you do not access the Exception property, the exception is unhandled." To only apply to OnlyOnFaulted, when it applies to all TaskContinuationOptions.
I can't see a way to make the failure deterministic without passing some sort of signal back from the continuation, which doesn't fit the signature and isn't desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is just a test so make it as messy as it needs to be to get a deterministic answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does pass deterministically now with the fix in place - what I couldn't get deterministic was the failure without the fix in place.
@@ -204,6 +206,7 @@ internal static void WaitForCompletion(Task task, int timeout, Action onTimeout | |||
} | |||
if (!task.IsCompleted) | |||
{ | |||
task.ContinueWith(t => { var ignored = t.Exception; }); //Ensure the task does not leave an unobserved exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. I repeated the experiment for the new lambda to confirm it's still not creating lots of instances of the closure
Co-authored-by: Cheena Malhotra <v-chmalh@microsoft.com>
Hi @jm771 , Could you please merge master into your branch again so we can trigger the failed AzureSQLDB pipelines? The failure indicates some missing changes from the new test framework. Just to make sure your fixes working well on these pipelines before we can merge it. Sorry for the inconvenience. Thanks, |
On holiday at the moment, I'll take a look when I get back next week. |
Hi @jm771 , do you have any chance to update your branch so we can merge your changes soon? Thanks, |
Sorry about the delay, I've merged just now. I'll have a look over all the results of the tests tomorrow. |
@jm771 Thank you for the updates. I have checked the failed tests on the pipelines and they are known issues that are not introduced by your PR. |
Excellent, thanks Karina! |
Fix for #688
I have some comments which I'll leave inline on the code review