Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the cycle detector in relation to --ponycdinterval #3501

Closed
sblessing opened this issue Apr 16, 2020 · 7 comments
Closed

Running the cycle detector in relation to --ponycdinterval #3501

sblessing opened this issue Apr 16, 2020 · 7 comments

Comments

@sblessing
Copy link
Contributor

sblessing commented Apr 16, 2020

Hey there,

I have been wondering for quite some time about pony's behavior with respect to garbage collecting actors. It seems to work correctly in simple cases, where --ponynoblock is given and no one holds references to actors (see actor.c:452-463). However, this case almost never exists in slightly more complex cases of actor applications, and will probably only ever apply to a small subset of all actors during the lifetime of a pony application.

After some more detailed analysis, I have stumbeled across this piece of code in the scheduler (which is awesome btw, compared to the first version back when we had written ponyrt).

scheduler.c:890-905

if(sched->index == 0)
    {
      // if cycle detection is enabled
      if(!ponyint_actor_getnoblock())
      {
        // trigger cycle detector by sending it a message if it is time
        uint64_t current_tsc = ponyint_cpu_tick();
        if(ponyint_cycle_check_blocked(&sched->ctx, last_cd_tsc, current_tsc))
        {
          last_cd_tsc = current_tsc;

          // cycle detector should now be on the queue
          if(actor == NULL)
            actor = pop_global(sched);
        }
      }

I do understand, that these days (after the great work of @dipinhora), lots of things have changed with regard to cycle detector scheduling. Moreover the cycle detector is not woken up indirectly by block messages as it used to be but rather activated initially after some time limit, when --ponycdinterval ticks have passed.

A call to ponyint_cycle_check_blocked will send a single message to the cycle detector. The important piece of information here is the context that is given to the call of ponyint_cycle_check_blocked, namely the context of the scheduler thread at index 0.

The IF condition right before the pop_global (scheduler.c:902-903) causes the cycle detector only to be scheduled, if scheduler 0 was not successful in picking an actor (or a next actor) in the directly preceding while loop iteration. Obviously, work stealing could kick in, but only if scheduler-0s queue is relatively empty.

Now I am afraid that in a system with lots of unblocked application actors, --ponycdinterval does not really have meaning, as its merely sets a lower bound to the CD interval, but its upper bound is theoretically unlimited, as the cycle detector is appended to the end of the queue after the very first check_blocked message was sent to it. I am sure that this could end up in a pathological case, that if --ponynoblock is not used, garbage collection of a large majority of actors would only happen right before the system becomes quiescent.

So here my suggestion/idea that we can discuss: Would it make sense to have the cycle detector, provided --ponynoblock is not used, to not be related to any scheduler thread, but to have it be maintained on the inject queue by relating it to a non-runtime thread when invoking ponyint_cycle_check_blocked? This could have the following effect:

a) get the cycle detector to be scheduled closer to --ponycdinterval
b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue.
c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system.
d) locality is probably not that important for the cycle detector as compared to application actors (where we attempt to schedule them on the senders scheduler queue), so I would not expect a performance impact when running the cycle detector from somewhere, rather than forcing it to be executed on scheduler-0 or to be stolen by some thread that is requesting work.

Also, I am unsure as to whether this behavior was actually intended, as the cycle detector is being created with a non-scheduler thread context (start.c:254-257). Theoretically, this context could be re-used to achieve the above relatively simply, provided that the cycle detector remembers its original context.

Obviously, I might be totally misunderstanding what is going on here, in which case this issue can be immediately closed again :). This is not unlikely as I have not been able to participate in most discussions. However, I do see that actor collection behaves suboptimal.

Happy to hear some feedback!

@dipinhora
Copy link
Contributor

You're right. That's a current limitation that doesn't need to exist and was not an explicitly intended behavior on my part when i was fiddling with things. Adding it onto the inject queue for any scheduler thread to pick up is the better option as you've explained.

@sblessing
Copy link
Contributor Author

sblessing commented Apr 17, 2020

Would this be a potential fix to achieve the above?

Obviously, we can do this prettier. Moreover, I still see some weird behavior when the cycle detector starts working. Are we positive, in the context of backpressure (@SeanTAllen), that actors are never muted when talking to the cycle detector?

The code in actor.c suggests so.

@dipinhora
Copy link
Contributor

@sblessing i'm 99.9% sure (it is possible i'm missing or misunderstanding something) that actors will not get muted due to sending messages to the cycle detector (the messages are not considered application messages and so will not trigger the maybe_mute code path in pony_sendv).

re: your suggested fix:

  • can you move this assertion pony_assert(ctx->current == actor); (sblessing@f24a23c#diff-10904e1425be3af9f8b87528569d2e6bR1116) to here in actor.c before ponyint_cycle_block(actor, &actor->gc); (sblessing@f24a23c#diff-b166f8416958ecd20981bbb8d58692b4R201) so it doesn't get lost?
  • i don't believe ponyint_cycle_get_context or the changes in static void run(scheduler_t* sched) are required. Putting the cycle detector on the inject queue with your other changes means the next thread to pop_global will pick it up and run it (as opposed to prior to your other changes where the cycle detector would be at the end of the queue for scheduler 0 and may not get run for a while unless work stealing kicks in).

Your changes will ensure that the cycle detector gets picked up as soon as possible (by whichever thread next does a pop_global) whenever any messages are sent to it at all (not only when scheduler 0 sends it a ponyint_cycle_check_blocked message) because it will be put on the inject queue for all of them. i hadn't considered that previously, but i think it might be the best option because otherwise the cycle detector could be at the end of some scheduler's queue due to a ponyint_cycle_block (or any other message type) when scheduler 0 sends it a ponyint_cycle_check_blocked message and the only way to ensure that doesn't happen is to ensure all messages to the cycle detector put it on the inject queue. Effectively, what we're saying is that the cycle detector is a special actor and should get processed as soon as possible if it has any work to do at all (accomplished by putting it on the inject queue). If we expect other similar special actors, we can instead save that context that you're saving in the cycle detector as a static variable in scheduler.c and add an accessor function making it easier to re-use across all such actors that need to immediately go on the inject queue.

The down side to this is that we don't give any time for many messages to queue up for the cycle detector to work through. It might be worth figuring out a way to ensure the cycle detector will only get scheduled when it receives a ponyint_cycle_check_blocked message (this boils down to a question of whether the cycle detector should batch process a bunch of messages at a time or if it should process messages as soon as it receives them; i think it might be better to let it batch the messages since we've already ensured there aren't a huge number of messages sent to it anymore). This would ensure the cycle detector would have a chunk of work to do each time it was run (many messages in it's queue) and would not get run only to process a single message at a time (note: there may be some memory related concerns to think through if this change is made due to the possibility of a large number of messages queuing up for the cycle detector between runs and the consequences of that but on the surface i think it may be worth it).

Can you explain what you mean by I still see some weird behavior when the cycle detector starts working? It's hard to help without an understanding of the behavior, why you think it is weird, and what you would expect the behavior to be instead.

@sblessing
Copy link
Contributor Author

sblessing commented Apr 19, 2020

@dipinhora thank you very much for the feedback and taking the time to tackle this issue. I have changed & rebased the commit based on your concerns - which are correct and make perfect sense.

I have also changed the implementation to be slightly more re-useable, such that the global inject queue context will be stored in a schedulers static variable inject_context and is accessible via ponyint_sched_get_inject_context. Moreover, that context is set in sched.c::ponyint_sched_init, right when the main pony_ctx is allocated.

I agree with your concerns regarding batch processing of cycle detector messages. I'll do some thinking on this topic!

Also, re "some weird behavior", I have to do a deeper analysis. After I have changed and tidied up the code to your suggestions (i.e. without the change in scheduler.c) I see different behavior. Prior to that rebase, the program had stalled and infinitely tried to terminate once the cycle detector had decided that actors can be collected...

@sblessing
Copy link
Contributor Author

sblessing commented Apr 19, 2020

Also, in a benchmark application, we were facing issues were collection of accumulator actors has been deferred to basically the entire benchmark becoming quiescent. This prevented us from running larger instances, because of memory limits.

Imho, with the change of commit sblessing@b447d1a, the behavior is vastly improved.

sblessing added a commit to sblessing/ponyc that referenced this issue Apr 20, 2020
…the inject queue (ponylang#3501)

This commit treats the cycle detector as special system actor that is scheduled
using the global inject queue. Consequently, the cycle detector is subject to work
stealing with a higher priority between all scheduler threads. Prior to this change,
the cycle detector was treated like any other application actor. Moreover, scheduler
thread 0 decided when the cycle detector should be activated for it to potentially do
some work. In some cases, this may have caused pathological behavior, where the time
interval --ponycdinterval for 'poking' the cycle detector was theoretically unbounded.

With this commit, the cycle detector always uses the global non-runtime context (inject_context),
whenever messages are sent to it, independent of the sending actor's context. This guarantees
that it will always be pushed to the global inject queue and will then be pop_global'd by the
first runtime thread to become available.

This has the following effect:

a) get the cycle detector to be scheduled closer to --ponycdinterval
b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue.
c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system.
d) locality is not that important for the cycle detector as for application actors, thus scheduling a previously blocked cycle detector
   using a runtime thread on a first-come-first serve basis should have no negative effect on performance.
dipinhora pushed a commit that referenced this issue Apr 21, 2020
…the inject queue (#3501)

This commit treats the cycle detector as special system actor that is scheduled
using the global inject queue. Consequently, the cycle detector is subject to work
stealing with a higher priority between all scheduler threads. Prior to this change,
the cycle detector was treated like any other application actor. Moreover, scheduler
thread 0 decided when the cycle detector should be activated for it to potentially do
some work. In some cases, this may have caused pathological behavior, where the time
interval --ponycdinterval for 'poking' the cycle detector was theoretically unbounded.

With this commit, the cycle detector always uses the global non-runtime context (inject_context),
whenever messages are sent to it, independent of the sending actor's context. This guarantees
that it will always be pushed to the global inject queue and will then be pop_global'd by the
first runtime thread to become available.

This has the following effect:

a) get the cycle detector to be scheduled closer to --ponycdinterval
b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue.
c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system.
d) locality is not that important for the cycle detector as for application actors, thus scheduling a previously blocked cycle detector
   using a runtime thread on a first-come-first serve basis should have no negative effect on performance.
@dipinhora
Copy link
Contributor

@dipinhora please loop me in on the batch processing thoughts. I have some ideas as well and I am also happy to implement those!

@sblessing re: batching, my idea is a simple hack (which others may or may not agree is a good idea). i was planning on modifying ponyint_cycle_check_blocked, ponyint_cycle_actor_created, ponyint_cycle_actor_destroyed, ponyint_cycle_block, ponyint_cycle_unblock and ponyint_cycle_ack to call a custom copy of pony_sendv modified to only call ponyint_sched_add when an ACTORMSG_CHECKBLOCKED message is sent to ensure that the cycle detector would not get scheduled for any other types of messages. this seems the cleanest approach that is guaranteed to not have an impact on any other actor message sends (i.e. no extra branches or anything added for normal actors).. Please feel free to implement this (or a better idea) if you have the time as i may not get to it for a few days at least.

also, maybe discussion on this (if there needs to be any), should be in its own issue (or on a PR or RFC or something) since your other PR resolved the problem identified in this issue?

@sblessing
Copy link
Contributor Author

@dipinhora Thanks for bringing me into the loop. I have some thoughs on this, and will open another issue for this!

This one can be closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants