Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor task scheduler refactoring #1482

Merged
merged 1 commit into from
Nov 29, 2015

Conversation

JeffCyr
Copy link
Contributor

@JeffCyr JeffCyr commented Nov 29, 2015

This pull request is a refactoring to fix the issues discussed in issue #1481. The major change with the old design is that each actor now have its own (lazy loaded) ActorTaskScheduler. CallContext is no longer required to flow the state so it should achieve better performance.

@rogeralsing
Copy link
Contributor

Looks good! 👍

@rogeralsing
Copy link
Contributor

Can you squash the commits into one?

@JeffCyr JeffCyr force-pushed the ActorTaskScheduler-refactoring branch from 362c87b to 7ace1db Compare November 29, 2015 20:23
@JeffCyr
Copy link
Contributor Author

JeffCyr commented Nov 29, 2015

@rogeralsing Done!

rogeralsing added a commit that referenced this pull request Nov 29, 2015
@rogeralsing rogeralsing merged commit 26daeac into akkadotnet:dev Nov 29, 2015
@rogeralsing
Copy link
Contributor

Thanks! 👍

@Aaronontheweb
Copy link
Member

I'm sorry, but this should be performance tested with a benchmark before it's merged.

CallContext is no longer required to flow the state so it should achieve better performance.

Probably, but I'll never take a person's word for it.

Reverting this until we have benchmark data to back this up.

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Nov 29, 2015

Sure I'll create a benchmark

@Aaronontheweb
Copy link
Member

Thanks @JeffCyr !

@rogeralsing
Copy link
Contributor

I do like the design used here.
There is a lot of headache that disappears by removing the CallContext.
As far as I can tell, @JeffCyr is correct, executing tasks and flowing async state w/o call context is faster.
That being said, there is a tradeoff by using more memory instead, as each async actor also holds a ref to its own TaskScheduler

This will allocate 8 bytes extra for all actors due to the task scheduler reference.
And an unknown amount of memory for actors that use the task scheduler.

I'd be more interested in seeing the memory footprint of async actors than performance in this case as there is nothing affecting the default message pipeline.

But all in all, I'm in favor for this design, it would probably solve the current Mono async problem we have also where the actor Context is null after async operations.

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Nov 30, 2015

I updated my branch to revert the revert and include a benchmark, can we reopen this PR or do I need to create another one?

I added an optional AsyncActor test in the PingPong benchmark PingPong --async.

Results shows that the refactored ActorTaskScheduler is 2-3x faster than the original.

Before:
benchmark-without-refactoring

After:
benchmark-with-refactoring

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Nov 30, 2015

@rogeralsing Regarding the memory footprint, there's not much we can do for the 8 extra bytes, but since ActorCell already have a lot of fields, it should not make a big difference proportionally.

For actors that are initializing their TaskScheduler, if you check http://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/TaskScheduler.cs,b76a4a6f77962f28

You can see that TaskScheduler is very lightweight (it only have an int field), it's just sad that they add a weakref of every TaskScheduler in a static dictionary (custom hash table) which is only for debugging purpose. They should have used a debugging flag like Task.s_asyncDebuggingEnabled does. I will probably create an issue in coreclr about this but we won't see the fix anytime soon.

We could benchmark the memory pressure when millions of actors initialize their TaskScheduler, but I guess it would be that same as keeping a hash table referencing all actors, which would probably be acceptable.

@@ -30,6 +30,7 @@ public partial class ActorCell : IUntypedActorContext, ICell
private bool _actorHasBeenCleared;
private Mailbox _mailbox;
private readonly ActorSystemImpl _systemImpl;
private ActorTaskScheduler m_taskScheduler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the m_ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old habits :) I'll fix that

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Nov 30, 2015

I created the issue 2189 about TaskScheduler on coreclr.

Here's the benchmark result for the creation of 10 millions TaskScheduler:

Real TaskScheduler test:
Memory usage: 420.48 MB
Total time: 00:00:13.9793632

Fake TaskScheduler test:
Memory usage: 148.89 MB
Total time: 00:00:01.0347952

The memory usage is not that bad and the cpu usage is bad but would probably be amortized by the application without notice.

There is a workaround by creating the ActorTaskScheduler with FormatterServices.GetUninitializedObject to bypass the constructor, but that would be a very nasty hack!

@Aaronontheweb
Copy link
Member

@JeffCyr you'll need to open a new PR

@Horusiath
Copy link
Contributor

@JeffCyr concerning your screens from the benchmark, did you noticed serious perf drop in standard actors after the changes?

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Dec 2, 2015

@Horusiath The code path for standard actor is not affected because the ActorTaskScheduler is only created if an actor uses it. My screenshots may have small variations because the payload on my machine was not the same when I did both runs, I ran the benchmark again and got similar results with and without the changes.

@rogeralsing
Copy link
Contributor

The reason I brought up the 8 extra bytes earlier is that once the actor reaches a certain size, there will be effects on the CPU mem cache pipeline.
JVM Akka have worked hard to bring the size down to prevent this.

I think that the .NET actors are still too big to benefit from any low level optimizations, but there could be potential side effects when altering the size of a raw actor cell.
(I dont think there is in this case, but in theory)

@JeffCyr
Copy link
Contributor Author

JeffCyr commented Dec 2, 2015

If this becomes an issue, we could group some infrequently accessed fields in another class and access them with a level of indirection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants