-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running the cycle detector in relation to --ponycdinterval #3501
Comments
You're right. That's a current limitation that doesn't need to exist and was not an explicitly intended behavior on my part when i was fiddling with things. Adding it onto the |
Would this be a potential fix to achieve the above? Obviously, we can do this prettier. Moreover, I still see some weird behavior when the cycle detector starts working. Are we positive, in the context of backpressure (@SeanTAllen), that actors are never muted when talking to the cycle detector? The code in |
@sblessing i'm 99.9% sure (it is possible i'm missing or misunderstanding something) that actors will not get muted due to sending messages to the cycle detector (the messages are not considered application messages and so will not trigger the re: your suggested fix:
Your changes will ensure that the The down side to this is that we don't give any time for many messages to queue up for the Can you explain what you mean by |
@dipinhora thank you very much for the feedback and taking the time to tackle this issue. I have changed & rebased the commit based on your concerns - which are correct and make perfect sense. I have also changed the implementation to be slightly more re-useable, such that the global inject queue context will be stored in a schedulers static variable I agree with your concerns regarding batch processing of cycle detector messages. I'll do some thinking on this topic! Also, re "some weird behavior", I have to do a deeper analysis. After I have changed and tidied up the code to your suggestions (i.e. without the change in scheduler.c) I see different behavior. Prior to that rebase, the program had stalled and infinitely tried to terminate once the cycle detector had decided that actors can be collected... |
Also, in a benchmark application, we were facing issues were collection of Imho, with the change of commit sblessing@b447d1a, the behavior is vastly improved. |
…the inject queue (ponylang#3501) This commit treats the cycle detector as special system actor that is scheduled using the global inject queue. Consequently, the cycle detector is subject to work stealing with a higher priority between all scheduler threads. Prior to this change, the cycle detector was treated like any other application actor. Moreover, scheduler thread 0 decided when the cycle detector should be activated for it to potentially do some work. In some cases, this may have caused pathological behavior, where the time interval --ponycdinterval for 'poking' the cycle detector was theoretically unbounded. With this commit, the cycle detector always uses the global non-runtime context (inject_context), whenever messages are sent to it, independent of the sending actor's context. This guarantees that it will always be pushed to the global inject queue and will then be pop_global'd by the first runtime thread to become available. This has the following effect: a) get the cycle detector to be scheduled closer to --ponycdinterval b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue. c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system. d) locality is not that important for the cycle detector as for application actors, thus scheduling a previously blocked cycle detector using a runtime thread on a first-come-first serve basis should have no negative effect on performance.
…the inject queue (#3501) This commit treats the cycle detector as special system actor that is scheduled using the global inject queue. Consequently, the cycle detector is subject to work stealing with a higher priority between all scheduler threads. Prior to this change, the cycle detector was treated like any other application actor. Moreover, scheduler thread 0 decided when the cycle detector should be activated for it to potentially do some work. In some cases, this may have caused pathological behavior, where the time interval --ponycdinterval for 'poking' the cycle detector was theoretically unbounded. With this commit, the cycle detector always uses the global non-runtime context (inject_context), whenever messages are sent to it, independent of the sending actor's context. This guarantees that it will always be pushed to the global inject queue and will then be pop_global'd by the first runtime thread to become available. This has the following effect: a) get the cycle detector to be scheduled closer to --ponycdinterval b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue. c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system. d) locality is not that important for the cycle detector as for application actors, thus scheduling a previously blocked cycle detector using a runtime thread on a first-come-first serve basis should have no negative effect on performance.
@sblessing re: batching, my idea is a simple hack (which others may or may not agree is a good idea). i was planning on modifying also, maybe discussion on this (if there needs to be any), should be in its own issue (or on a PR or RFC or something) since your other PR resolved the problem identified in this issue? |
@dipinhora Thanks for bringing me into the loop. I have some thoughs on this, and will open another issue for this! This one can be closed! |
Hey there,
I have been wondering for quite some time about pony's behavior with respect to garbage collecting actors. It seems to work correctly in simple cases, where
--ponynoblock
is given and no one holds references to actors (see actor.c:452-463). However, this case almost never exists in slightly more complex cases of actor applications, and will probably only ever apply to a small subset of all actors during the lifetime of a pony application.After some more detailed analysis, I have stumbeled across this piece of code in the scheduler (which is awesome btw, compared to the first version back when we had written ponyrt).
scheduler.c:890-905
I do understand, that these days (after the great work of @dipinhora), lots of things have changed with regard to cycle detector scheduling. Moreover the cycle detector is not woken up indirectly by block messages as it used to be but rather activated initially after some time limit, when
--ponycdinterval
ticks have passed.A call to
ponyint_cycle_check_blocked
will send a single message to the cycle detector. The important piece of information here is the context that is given to the call ofponyint_cycle_check_blocked
, namely the context of the scheduler thread at index 0.The IF condition right before the
pop_global
(scheduler.c:902-903) causes the cycle detector only to be scheduled, if scheduler 0 was not successful in picking an actor (or a next actor) in the directly preceding while loop iteration. Obviously, work stealing could kick in, but only if scheduler-0s queue is relatively empty.Now I am afraid that in a system with lots of unblocked application actors,
--ponycdinterval
does not really have meaning, as its merely sets a lower bound to the CD interval, but its upper bound is theoretically unlimited, as the cycle detector is appended to the end of the queue after the very first check_blocked message was sent to it. I am sure that this could end up in a pathological case, that if--ponynoblock
is not used, garbage collection of a large majority of actors would only happen right before the system becomes quiescent.So here my suggestion/idea that we can discuss: Would it make sense to have the cycle detector, provided
--ponynoblock
is not used, to not be related to any scheduler thread, but to have it be maintained on the inject queue by relating it to a non-runtime thread when invokingponyint_cycle_check_blocked
? This could have the following effect:a) get the cycle detector to be scheduled closer to
--ponycdinterval
b) have the cycle detector to be scheduled with higher priority as it can be 'pop_global'd' by any scheduler thread from the inject queue.
c) not defer garbage collection of large actor graphs for long running programs right to the end (or close to) a quiescent system.
d) locality is probably not that important for the cycle detector as compared to application actors (where we attempt to schedule them on the senders scheduler queue), so I would not expect a performance impact when running the cycle detector from somewhere, rather than forcing it to be executed on scheduler-0 or to be stolen by some thread that is requesting work.
Also, I am unsure as to whether this behavior was actually intended, as the cycle detector is being created with a non-scheduler thread context (start.c:254-257). Theoretically, this context could be re-used to achieve the above relatively simply, provided that the cycle detector remembers its original context.
Obviously, I might be totally misunderstanding what is going on here, in which case this issue can be immediately closed again :). This is not unlikely as I have not been able to participate in most discussions. However, I do see that actor collection behaves suboptimal.
Happy to hear some feedback!
The text was updated successfully, but these errors were encountered: