Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proof of concept: shutdown supervisor via mailbox command #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

srijs
Copy link
Collaborator

@srijs srijs commented Mar 28, 2016

Not intending to merge this yet, purely to illustrate a possible solution to the problem outlined in #5.

@srijs srijs mentioned this pull request Mar 28, 2016
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 86.25% when pulling ec58921 on proof-of-concept/shutdown-messages into 79e3e77 on master.

@adinapoli
Copy link
Owner

Ops! This PR seems to have triggered a regression in the test suite (or simply our test suite is not very good and this is a transient failure). In any case, these are my sparse comments (typed at 8:26AM, where I'm still half-asleep, so ignore the nonsense, if any):

  • It looks good, but I'm struggling to see the benefit of this. What have we gained here? Probably to reap the benefit we should also start catching exceptions properly and call shutdownSupervisor accordingly or run this cleanup code in the Left branch of the forkFinally call (which acts as a bracket, essentially).
  • I think I miscommunicated on Exception safety #5 (and that's my bad entirely!). The issue was labeled by me very generically "Exception safety", but it reported that one single example (which was a comment by Yuras on Reddit). I still think that, for that particular example only, that interleave is not prevented by Epoch, but I'm 100% with you in saying that the general problem of exception safety it's a totally different kettle of fish that Epochs only cannot solve (and therefore I agree with your proposed solution about promoting a mailbox to a basket of Messages).

So, in a nutshell:

  1. Can you walk me through the benefits of the current PR? (I'm totally accepting "it's still a poc, so no added benefit is presented" as an aswer)
  2. Any clue on why the test suite fails?
  3. Do you agree with my analysis on Exception safety #5? Are we on the same page or am I struggling to see some subtle detail?

Thanks!

@srijs
Copy link
Collaborator Author

srijs commented Mar 30, 2016

Hi @adinapoli!

  1. Let me attempt a proof by construction to show that the concrete Exception safety #5 case is still an issue with epochs. You obviously know the internals better than me, so you might be able to disprove me here:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: supervised sends DeadLetter, with epoch=2;
    4. Epoch 3: on other thread shutdownSupervisor kills the child (note: the child is already dead, so nothing happens);
    5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the child has not been respawned (but only killed) the child's epoch hasn't been updated, and is still epoch=0. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4;
    6. Epoch 5: shutdownSupervisor kills the monitoring thread (doesn't consider epochs at all for this task)
    7. Epoch 6: we have an orphan child

    It may be possible to fix the issue by making the shutdownSupervisor aware of epochs as well, but since we already defined a more general solution to data race conditions, I figured we might just as well solve it directly like that.

    How I imagine it working with the queued Shutdown instruction:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: supervised sends DeadLetter, with epoch=2; queue looks like: [DeadLetter]
    4. Epoch 3: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [DeadLetter, Shutdown]
    5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4; queue looks like: [Shutdown]
    6. Epoch 5: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: []
    7. Epoch 6: all children and the supervisor are dead

    What would happen with a queued Shutdown instruction, if the order of steps 3 and 4 is reversed:

    1. Epoch 0: spawn a supervised child, with epoch=0;
    2. Epoch 1: child thread throws exception;
    3. Epoch 2: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [ Shutdown]
    4. Epoch 3: supervised sends DeadLetter, with epoch=3; queue looks like: [Shutdown, DeadLetter]
    5. Epoch 4: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: [DeadLetter]
    6. Epoch 5: the monitoring thread stops the handleEvents loop
    7. Epoch 6: all children and the supervisor are dead
  2. I suspect we now might need to route more things through the channel, not just the shutdown, since it is no longer instantaneous. I'm hoping to have some time on the weekend to play with that.

@adinapoli
Copy link
Owner

Hey @srijs , thank you very much for this! It was exactly the kind of proof of construction I needed to see things clearly. Sorry if I gave the impression of not play it constructively here, but I think such "rubber duck debugging" are super useful and I feel we are really on the same page. I now see the value of your proposal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants