-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[WIP] Schedule actor creation like a regular task. #1351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Schedule actor creation like a regular task. #1351
Conversation
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
How does this sound? The monitor does nothing for now. When a local scheduler fails to schedule an actor task on another local scheduler (in The one thing I am not sure about is if somehow all reconstruction attempts fail. Then, a bunch of tasks will be cached in |
|
cc @atumanov in case you have thoughts about the design here :) |
|
@stephanie-wang All of the reconstruction attempts could probably fail (it'd be more robust to have some retry mechanism). Also, we really only need to rerun the actor creation task in We could do the following:
|
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Hmm yeah, that sounds good. One thing I would add in addition to that is a check in Also, be careful with reconstruction suppression. We only want to submit the actor creation task once. Ideally, we would reconstruct the result of the initial creation task, instead of directly resubmitting it, so we can reuse the current suppression mechanism. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
|
Closing for now. I'll do a fresh implementation of this soon. |
Creating this PR in order to get feedback.
This PR does the following:
The main question is how we should handle fault tolerance. E.g., what should trigger the re-execution of an actor creation task. Should it be triggered by the usual fault tolerance mechanism? Or should it be triggered by the monitor when it detects that a local scheduler has died (as is currently done)? The latter is a bit trickier to support since we may need to trigger reconstruction of the actor task before it is known what local scheduler was going to create the actor. The former will require some change as well. Basically the local schedulers needs to be notified that an actor no longer exists or that a local scheduler has died so that they can stop submitting actor tasks directly to that local scheduler.
cc @stephanie-wang