Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock when calling ActorSelection.ResolveOne from GUI context #2550

Merged
merged 8 commits into from
Mar 16, 2017

Conversation

0x53A
Copy link
Contributor

@0x53A 0x53A commented Mar 7, 2017

The first test passes, the second test blocks indefinitely (even though I passed a timeout!).

This is probably an issue with a missing .ConfigureAwait(false).

SynchronizationContextHelper was copied from here: http://stackoverflow.com/questions/14087257/how-to-add-synchronization-context-to-async-test-method

@0x53A 0x53A changed the title Demonstrate deadlock when calling ActorSelection.ResolveOne from GUI context Fix deadlock when calling ActorSelection.ResolveOne from GUI context Mar 7, 2017
@0x53A
Copy link
Contributor Author

0x53A commented Mar 7, 2017

Just adding this one ConfigureAwait fixes this specific case, but I went through and added it to the whole Akka lib (excluding a few contrib and tests).

IMO this should be everywhere, and adding an await without ConfigureAwait should fail the build ...

@0x53A
Copy link
Contributor Author

0x53A commented Mar 7, 2017

The test failed in CI (after my fix) with
image

So i guess I would need to reimplement the scheduler instead of reusing the WPF bits ...

@0x53A
Copy link
Contributor Author

0x53A commented Mar 7, 2017

Nice ...
image

@0x53A
Copy link
Contributor Author

0x53A commented Mar 7, 2017

Eh, I have set the test to skipped ... should hopefully become green now

@0x53A 0x53A force-pushed the demonstrate-resolveone-deadlock branch from 13db376 to 47b4164 Compare March 8, 2017 08:36
@0x53A
Copy link
Contributor Author

0x53A commented Mar 8, 2017

@Aaronontheweb I rebased it on dev, and the current failure is
image

which also fails in my other PR.

I assume the CI is just slightly flaky?

@Danthar
Copy link
Member

Danthar commented Mar 8, 2017

Not in this case. The Akka.Tests.Performance.Actor.Pattern.AskSpec+AskThroughput test times out. Maybe a deadlock due to your latest changes ?

@alexvaluyskiy
Copy link
Contributor

@Danthar No, we have problems with Perf tests in all pull requests

@Danthar
Copy link
Member

Danthar commented Mar 8, 2017

@alexvaluyskiy But nbench tests should be reported normally in TC right? In this case it times out on invoking the nbench runner itself.
Considering other tests run fine, this leads me to believe there is something going on with the spec its running. ?

@0x53A
Copy link
Contributor Author

0x53A commented Mar 8, 2017

The failing part seems to be Akka.Tests.Performance.Actor.Pattern.AskSpec+AskThroughput (http://petabridge-ci.cloudapp.net/viewLog.html?buildId=22768&buildTypeId=AkkaNet_AkkaNetWindowsPerformanceTests&tab=buildLog#_state=15933&focus=15933), which does sound suspicious. I will try to test it locally, but can only do that in the evening.

@0x53A
Copy link
Contributor Author

0x53A commented Mar 8, 2017

so the good news is I was able to reproduce the failure locally:
image

the bad news is I was able to reproduce the failure locally :\

I have executed the failing command

C:\Users\lr\Source\Repos\akka.net\src\packages\NBench.Runner\lib\net45\NBench.Runner.exe "C:\Users\lr\Source\Repos\akka.net\src\core\Akka.Tests.Performance\bin\Release\Akka.Tests.Performance.dll" "output-directory="C:\Users\lr\Source\Repos\akka.net\PerfResults"" "concurrent="true"" "trace="true"" "teamcity="false""

twice, once for 3ae56ba (base) and once for 47b4164 (modified).

The first, aborted run (47b4164 ) took 25 minutes, exactly the timeout from build.fsx. It got until Akka.Tests.Performance.Actor.Pattern.AskSpec+AskThroughput before it was canceled.

The second run for (47b4164) finished between 9:11:42 PM and 9:37:36 PM, so it took about 26 minutes.

The third run (3ae56ba) finished between 9:43:19 PM and 10:09:06 PM, so took about 26 minutes and would also have been canceled due to the timeout.

So if I didn't mess up anything related to the build, I can't see any difference between dev and my changes.

PerfResults.zip

@0x53A
Copy link
Contributor Author

0x53A commented Mar 8, 2017

The passing test run from here: http://petabridge-ci.cloudapp.net/viewLog.html?buildId=22829&buildTypeId=AkkaNet_AkkaNetWindowsPerformanceTests&tab=buildLog#_state=7528&focus=14984
image

It ran between [21:48:53] and [22:13:20] , so took 24:27 minutes. So the timeout of 25 minutes is maybe slightly optimistic.


Edit: after increasing the timeout, mono failed again
image

but Akka.Tests.Performance.dll succeded in almost exactly 25 minutes.

@0x53A 0x53A force-pushed the demonstrate-resolveone-deadlock branch from 250366a to 6678f41 Compare March 9, 2017 00:36
@@ -3796,8 +3796,8 @@ namespace Akka.Pattern
public Akka.Pattern.CircuitBreaker OnClose(System.Action callback) { }
public Akka.Pattern.CircuitBreaker OnHalfOpen(System.Action callback) { }
public Akka.Pattern.CircuitBreaker OnOpen(System.Action callback) { }
public async System.Threading.Tasks.Task<T> WithCircuitBreaker<T>(System.Func<System.Threading.Tasks.Task<T>> body) { }
public async System.Threading.Tasks.Task WithCircuitBreaker(System.Func<System.Threading.Tasks.Task> body) { }
public System.Threading.Tasks.Task<T> WithCircuitBreaker<T>(System.Func<System.Threading.Tasks.Task<T>> body) { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @alexvaluyskiy I get that this is an API change, but would removing the async keyword here result in a break for something that was await-ing on this task? Or is this just goofery from the API printer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async is used as a function modifier to avoid breaking existing code that use await as a variable (and some other reasons). So basically backwards compat reason on a c# feature level.
Removing or adding it does not influence upstream dependencies at all as far as i know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Danthar

@0x53A
Copy link
Contributor Author

0x53A commented Mar 15, 2017

@Aaronontheweb is it possible to compare two runs of nbench? This does touch a lot of places, so in theory i would need to check every performance spec.
Otherwise i could revert the last commit so that only the one await is changed, reducing the diff.

Thank you.

@Aaronontheweb
Copy link
Member

@0x53A this looks good to me, but the do we need to add a license header / third party notice file for the Nito source code? cc @StephenCleary

@0x53A
Copy link
Contributor Author

0x53A commented Mar 16, 2017

Just to be clear, the Nito source is only used in the unit test project.
I would have used the nuget, if possible.

Regardless, credit should be given where credit is due.
So I have copied both license files to the directory, and the about.txt in the directory already links to the github repositories.

@StephenCleary
Copy link

It's all MIT-licensed. No problems.

@Aaronontheweb
Copy link
Member

thanks @0x53A - I'm satisfied and it sounds like @StephenCleary is too. Thanks for all of your hard work. I'll merge this in once CI runs. I checked the perf numbers and everything looked fine there.

@Aaronontheweb Aaronontheweb added this to the 1.2.0 milestone Mar 16, 2017
@Aaronontheweb Aaronontheweb merged commit dd39812 into akkadotnet:dev Mar 16, 2017
@0x53A 0x53A deleted the demonstrate-resolveone-deadlock branch April 16, 2017 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants