Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 688 AsyncHelper.WaitForCompletion leaks unobserved exceptions #692

Merged
merged 13 commits into from
Oct 22, 2020

Conversation

jm771
Copy link
Contributor

@jm771 jm771 commented Aug 14, 2020

Fix for #688

I have some comments which I'll leave inline on the code review

@dnfadmin
Copy link

dnfadmin commented Aug 14, 2020

CLA assistant check
All CLA requirements met.

private void TimeOutATask()
{
TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>();
AsyncHelper.WaitForCompletion(tcs.Task, 1); //Will time out as task uncompleted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that this line causes the unit test to take a second to run - but that's the minimum timeout on WaitForCompletion

EventHandler<UnobservedTaskExceptionEventArgs> handleUnobservedException =
(o, a) => { unobservedExceptionHappenedEvent.Set(); };

TaskScheduler.UnobservedTaskException += handleUnobservedException;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't much like messing with global state from a unit test, but I'm not sure of what other way to test this. I had a look at TaskExceptionHolder (which is what is responsible for raising the event when finalized) but it looks anything along those lines would be messing with the internals of the Task system too much.

@jm771
Copy link
Contributor Author

jm771 commented Aug 14, 2020

So here's something exciting

2020-08-14T15:43:06.0285145Z X Microsoft.Data.SqlClient.Tests.SqlHelperTest.WaitForCompletion_DoesNotCreateUnobservedException [1s 35ms] 2020-08-14T15:43:06.0285944Z Error Message: 2020-08-14T15:43:06.0286678Z Did not expect an unobserved exception, but found a System.InvalidOperationException with message "Fail requested"

This is not the unobserved exception that I'm creating, so some other unit test is leaking unobserved exceptions which my unit test is picking up on - would welcome advice on how to proceed.

@@ -204,6 +206,7 @@ internal static void WaitForCompletion(Task task, int timeout, Action onTimeout
}
if (!task.IsCompleted)
{
task.ContinueWith(_ => { }); //Ensure the task does not leave an unobserved exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify that this doesn't allocate a delegate and closure/state object on each invocation? I've done a lot of work to try and reduce allocations for async calls and I'd like to make sure things aren't regressed without good reason.

Copy link
Contributor Author

@jm771 jm771 Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't do - as there is nothing to capture in the closure - did this as an experiment to confirm:

        private static Action<Task> GetLambdaNoClosure(int context)
	{
		return _ => { };
	}
	
	private static Action<Task> GetLambdaWithClosure(int context)
	{
		return _ => { context++; };
	}
	
	public static void Main()
	{
		var a = GetLambdaNoClosure(1);
		var b = GetLambdaNoClosure(2);
		Console.WriteLine(object.ReferenceEquals(a, b));
		
		var c = GetLambdaWithClosure(1);
		var d = GetLambdaWithClosure(2);
		Console.WriteLine(object.ReferenceEquals(c, d));
	}

Output:
True False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Correctness has priority before performance anyway so probably not is good enough for me.

@@ -90,14 +95,14 @@ public static void TestDbCommand()
{
Fail = true
};
commandFail.ExecuteNonQueryAsync().ContinueWith((t) => { }, TaskContinuationOptions.OnlyOnFaulted).Wait();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting TaskContinuationOptions.OnlyOnFaulted and not checking the exception leads to the exception being unobserved:

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskcontinuationoptions?view=netcore-3.1

"If you do not access the Exception property, the exception is unhandled."

I also was not a fan of the fact that if the command did not fail, the test would hand indefinitely, so I swapped to the current approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I have found the source of these exceptions, and should be fixed in latest commit

@jm771
Copy link
Contributor Author

jm771 commented Aug 15, 2020

Trying to go through the test failures now - there's a bunch of manual test failures which it's hard to see how I'd have caused, some failures relating to encryption which I have no idea about, and then the one failure that is definitely me is the test I've added failing on build configuration Ubuntu 16 AnyCPU,Release,netcoreapp,true - the same test does not fail on build configuration Ubuntu 16 AnyCPU,Release,netcoreapp,false - but I don't have any idea as to what that bool parameter is referring to - I don't have permission to view the pipeline, and couldn't glean any difference from viewing the log files - can someone help me out?

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 16, 2020

Most the AE test runs fail for configuration reasons. I ignore them, i really hope the SQL team aren't relying on them.

The only one i can find that's relevant to your PR is this: https://dev.azure.com/sqlclientdrivers-ci/sqlclient/_build/results?buildId=15188&view=logs&j=9d906afe-3eb5-5d79-316d-86f3f3fa360c&t=52375b9b-c5db-514a-5e98-413583db2186

2020-08-15T11:22:49.7596685Z [xUnit.net 00:00:17.23]     Microsoft.Data.SqlClient.Tests.SqlHelperTest.WaitForCompletion_DoesNotCreateUnobservedException [FAIL]
2020-08-15T11:22:49.7602056Z [xUnit.net 00:00:17.23]       Did not expect an unobserved exception, but found a System.TimeoutException with message "Dummy timeout exception"
2020-08-15T11:22:49.7611531Z [xUnit.net 00:00:17.23]       Expected: False
2020-08-15T11:22:49.7612839Z [xUnit.net 00:00:17.23]       Actual:   True
2020-08-15T11:22:49.7613874Z [xUnit.net 00:00:17.24]       Stack Trace:
2020-08-15T11:22:49.7614567Z [xUnit.net 00:00:17.24]         SqlHelperTest.cs(46,0): at Microsoft.Data.SqlClient.Tests.SqlHelperTest.WaitForCompletion_DoesNotCreateUnobservedException()
2020-08-15T11:22:49.7615169Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.RetrieveWorkstationId(workstation: null, withDispose: False, shouldMatchSetWorkstationId: False) [1ms]
2020-08-15T11:22:49.7615733Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.RetrieveWorkstationId(workstation: "RandomStringForTargetServer", withDispose: False, shouldMatchSetWorkstationId: True) [1ms]
2020-08-15T11:22:49.7616815Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.RetrieveWorkstationId(workstation: "RandomStringForTargetServer", withDispose: True, shouldMatchSetWorkstationId: False) [1ms]
2020-08-15T11:22:49.7617389Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.RetrieveWorkstationId(workstation: "", withDispose: False, shouldMatchSetWorkstationId: False) [1ms]
2020-08-15T11:22:49.7617837Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.ConnectionTest [652ms]
2020-08-15T11:22:49.7618230Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.ConnectionTestValidCredentialCombination [1ms]
2020-08-15T11:22:49.7618637Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.SqlConnectionEmptyParameters [1ms]
2020-08-15T11:22:49.7619256Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.ClosedConnectionSchemaRetrieval [1ms]
2020-08-15T11:22:49.7620040Z   √ Microsoft.Data.SqlClient.Tests.SqlConnectionBasicTests.IntegratedAuthConnectionTest [161ms]
2020-08-15T11:22:49.7620562Z   X Microsoft.Data.SqlClient.Tests.SqlHelperTest.WaitForCompletion_DoesNotCreateUnobservedException [1s 30ms]
2020-08-15T11:22:49.7621135Z   Error Message:
2020-08-15T11:22:49.7621966Z    Did not expect an unobserved exception, but found a System.TimeoutException with message "Dummy timeout exception"
2020-08-15T11:22:49.7622793Z Expected: False
2020-08-15T11:22:49.7623560Z Actual:   True
2020-08-15T11:22:49.7624319Z   Stack Trace:
2020-08-15T11:22:49.7625178Z      at Microsoft.Data.SqlClient.Tests.SqlHelperTest.WaitForCompletion_DoesNotCreateUnobservedException() in E:\ci-agent1\_work\1\s\src\Microsoft.Data.SqlClient\tests\FunctionalTests\SqlHelperTest.cs:line 46

@@ -116,17 +121,17 @@ public static void TestDbCommand()
source = new CancellationTokenSource();
Task.Factory.StartNew(() => { command.WaitForWaitingForCancel(); source.Cancel(); });
Task result = command.ExecuteNonQueryAsync(source.Token);
Assert.True(result.IsFaulted, "Task result should be faulted");
Assert.True(result.Exception != null, "Task result should be faulted");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking IsFaulted does not count as observing an exception, accessing the Exception property does

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2020

The test suite got updated recently to make all the AE runs function. Can you merge or rebase to master and see if you get clean test runs?

@@ -13,6 +13,11 @@ namespace Microsoft.Data.SqlClient.ManualTesting.Tests
{
public static class BaseProviderAsyncTest
{
private static void AssertTaskFaults(Task t)
{
Assert.ThrowsAny<Exception>(() => t.Wait(TimeSpan.FromMilliseconds(1)));
Copy link
Member

@cheenamalhotra cheenamalhotra Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you confirm if if providing timeout here is intentional?
Although it's a test, but it changes existing behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's intentional - note that t.Wait returns false in the case of timeout (doesn't throw a TimeoutException) so if the task under test times out the test will fail, as no exception will be thrown. I thought this was preferable to the previous behaviour where if the Task hung the test would hang indefinitely (or until killed by the framework)

In this particular case all the Tasks under test are already completed ones generated by Task.FromResult, so we can get away with a 1ms timeout and know nothing will timeout.

However this is somewhat a matter of taste, and I don't know the details of whether your test suite is set up to handle timing out tests well. So if you'd rather have a longer timeout or no timeout I'm happy to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original design for this is to verify that the unhandled exception can be propagated through tasks. We don't want the tests to fail due to this 1 ms timeout although it should not happen according to your explanation. I personally would prefer not setting timeout here.

}

[Fact]
public void WaitForCompletion_DoesNotCreateUnobservedException()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does need a bit of rework to handle randomness. It fails randomly in pipelines.

You'll notice if you run this test in loop, the second round does not pass ever. Any thoughts why can't we run this test in second round? It maybe a hint towards random errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just looking into this - it seems the fix I had written to WaitForCompletion wasn't actually a fix. I needed to make the continuation actually observe the exception, the behaviour I had lead to a race condition, where if the continuation hadn't been executed at the point we called GC.Collect, then there was still a reference to the original task and it wasn't Collected. Creating a false positive for me having fixed it.

I proved this by modifying WaitForCompletion to return the continuation, and waited for it to complete, at which point the error happened deterministically. Making the continuation observe the exception fixed the error. My bad, I misinterpreted the text here: https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskcontinuationoptions?view=netcore-3.1 saying "If you do not access the Exception property, the exception is unhandled." To only apply to OnlyOnFaulted, when it applies to all TaskContinuationOptions.

I can't see a way to make the failure deterministic without passing some sort of signal back from the continuation, which doesn't fit the signature and isn't desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is just a test so make it as messy as it needs to be to get a deterministic answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does pass deterministically now with the fix in place - what I couldn't get deterministic was the failure without the fix in place.

@@ -204,6 +206,7 @@ internal static void WaitForCompletion(Task task, int timeout, Action onTimeout
}
if (!task.IsCompleted)
{
task.ContinueWith(t => { var ignored = t.Exception; }); //Ensure the task does not leave an unobserved exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. I repeated the experiment for the new lambda to confirm it's still not creating lots of instances of the closure

Co-authored-by: Cheena Malhotra <v-chmalh@microsoft.com>
@karinazhou
Copy link
Member

Hi @jm771 ,

Could you please merge master into your branch again so we can trigger the failed AzureSQLDB pipelines? The failure indicates some missing changes from the new test framework. Just to make sure your fixes working well on these pipelines before we can merge it. Sorry for the inconvenience.

Thanks,

@jm771
Copy link
Contributor Author

jm771 commented Sep 25, 2020

On holiday at the moment, I'll take a look when I get back next week.

@karinazhou karinazhou added the ⏳ Waiting for Customer Issues/PRs waiting for user response/action. label Oct 5, 2020
@karinazhou
Copy link
Member

Hi @jm771 , do you have any chance to update your branch so we can merge your changes soon?

Thanks,

@jm771
Copy link
Contributor Author

jm771 commented Oct 21, 2020

Hi @jm771 , do you have any chance to update your branch so we can merge your changes soon?

Thanks,

Sorry about the delay, I've merged just now. I'll have a look over all the results of the tests tomorrow.

@karinazhou
Copy link
Member

@jm771 Thank you for the updates. I have checked the failed tests on the pipelines and they are known issues that are not introduced by your PR.

@karinazhou karinazhou removed the ⏳ Waiting for Customer Issues/PRs waiting for user response/action. label Oct 22, 2020
@cheenamalhotra cheenamalhotra added this to the 2.1.0-preview2 milestone Oct 22, 2020
@jm771
Copy link
Contributor Author

jm771 commented Oct 22, 2020

@jm771 Thank you for the updates. I have checked the failed tests on the pipelines and they are known issues that are not introduced by your PR.

Excellent, thanks Karina!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants