proof of concept: shutdown supervisor via mailbox command #26

srijs · 2016-03-28T04:13:47Z

Not intending to merge this yet, purely to illustrate a possible solution to the problem outlined in #5.

coveralls · 2016-03-28T04:16:38Z

Coverage increased (+0.6%) to 86.25% when pulling ec58921 on proof-of-concept/shutdown-messages into 79e3e77 on master.

adinapoli · 2016-03-30T06:32:15Z

Ops! This PR seems to have triggered a regression in the test suite (or simply our test suite is not very good and this is a transient failure). In any case, these are my sparse comments (typed at 8:26AM, where I'm still half-asleep, so ignore the nonsense, if any):

It looks good, but I'm struggling to see the benefit of this. What have we gained here? Probably to reap the benefit we should also start catching exceptions properly and call shutdownSupervisor accordingly or run this cleanup code in the Left branch of the forkFinally call (which acts as a bracket, essentially).
I think I miscommunicated on Exception safety #5 (and that's my bad entirely!). The issue was labeled by me very generically "Exception safety", but it reported that one single example (which was a comment by Yuras on Reddit). I still think that, for that particular example only, that interleave is not prevented by Epoch, but I'm 100% with you in saying that the general problem of exception safety it's a totally different kettle of fish that Epochs only cannot solve (and therefore I agree with your proposed solution about promoting a mailbox to a basket of Messages).

So, in a nutshell:

Can you walk me through the benefits of the current PR? (I'm totally accepting "it's still a poc, so no added benefit is presented" as an aswer)
Any clue on why the test suite fails?
Do you agree with my analysis on Exception safety #5? Are we on the same page or am I struggling to see some subtle detail?

Thanks!

srijs · 2016-03-30T06:52:00Z

Hi @adinapoli!

Let me attempt a proof by construction to show that the concrete Exception safety #5 case is still an issue with epochs. You obviously know the internals better than me, so you might be able to disprove me here:
1. Epoch 0: spawn a supervised child, with epoch=0;
2. Epoch 1: child thread throws exception;
3. Epoch 2: supervised sends DeadLetter, with epoch=2;
4. Epoch 3: on other thread shutdownSupervisor kills the child (note: the child is already dead, so nothing happens);
5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the child has not been respawned (but only killed) the child's epoch hasn't been updated, and is still epoch=0. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4;
6. Epoch 5: shutdownSupervisor kills the monitoring thread (doesn't consider epochs at all for this task)
7. Epoch 6: we have an orphan child
It may be possible to fix the issue by making the shutdownSupervisor aware of epochs as well, but since we already defined a more general solution to data race conditions, I figured we might just as well solve it directly like that.

How I imagine it working with the queued Shutdown instruction:
1. Epoch 0: spawn a supervised child, with epoch=0;
2. Epoch 1: child thread throws exception;
3. Epoch 2: supervised sends DeadLetter, with epoch=2; queue looks like: [DeadLetter]
4. Epoch 3: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [DeadLetter, Shutdown]
5. Epoch 4: now monitoring thread (in handleEvents) receives the DeadLetter with epoch=2. Since the childs epoch is lower than the dead letter's, the monitor respawns the child, with epoch=4; queue looks like: [Shutdown]
6. Epoch 5: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: []
7. Epoch 6: all children and the supervisor are dead
What would happen with a queued Shutdown instruction, if the order of steps 3 and 4 is reversed:
1. Epoch 0: spawn a supervised child, with epoch=0;
2. Epoch 1: child thread throws exception;
3. Epoch 2: on other thread shutdownSupervisor queues a Shutdown instruction; queue looks like: [ Shutdown]
4. Epoch 3: supervised sends DeadLetter, with epoch=3; queue looks like: [Shutdown, DeadLetter]
5. Epoch 4: the monitoring thread receives the Shutdown instruction and kills all children and the monitoring thread; queue looks like: [DeadLetter]
6. Epoch 5: the monitoring thread stops the handleEvents loop
7. Epoch 6: all children and the supervisor are dead
I suspect we now might need to route more things through the channel, not just the shutdown, since it is no longer instantaneous. I'm hoping to have some time on the weekend to play with that.

adinapoli · 2016-03-30T18:25:09Z

Hey @srijs , thank you very much for this! It was exactly the kind of proof of construction I needed to see things clearly. Sorry if I gave the impression of not play it constructively here, but I think such "rubber duck debugging" are super useful and I feel we are really on the same page. I now see the value of your proposal!

shutdown supervisor via mailbox command

ec58921

srijs mentioned this pull request Mar 28, 2016

Exception safety #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proof of concept: shutdown supervisor via mailbox command #26

proof of concept: shutdown supervisor via mailbox command #26

srijs commented Mar 28, 2016

coveralls commented Mar 28, 2016

adinapoli commented Mar 30, 2016

srijs commented Mar 30, 2016

adinapoli commented Mar 30, 2016

proof of concept: shutdown supervisor via mailbox command #26

Are you sure you want to change the base?

proof of concept: shutdown supervisor via mailbox command #26

Conversation

srijs commented Mar 28, 2016

coveralls commented Mar 28, 2016

adinapoli commented Mar 30, 2016

srijs commented Mar 30, 2016

adinapoli commented Mar 30, 2016