-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(overlord): reversed order of manager stop #447
Conversation
@Ardelean-Calin Just a note here that the state engine is cross team maintained between pebble, snapd and some other teams (workspace engineering @dmitry-lyfar), so we should make sure the changes we make is double checked with them as well, otherwise a future cross-repo sync of the state code could slip in some changes that break existing critical functioning systems. Note that the TaskRunner state-engine entry is explicitly done last for startup, and stop today. The task runner is special in that stopping it will kill and wait for all task tombs in the system to exit (block). Inverting this order will place task kill & wait first, which means state engine managers interacting with tasks will possibly observe a slightly different pattern, where a task can die not by the manager's own intent, but by the Task runner shutting down. This could be no issue, but I am pointing this out just to say that this very trivial change could expose a different execution path for snapd / pebble on shutdown, for interaction between managers and the task runner. Probably just needs some thorough testing on existing code bases. |
@Ardelean-Calin could you give an example of such a M1->M2 dependency that justifies the order change? My understanding is that managers interact solely via the state but do not depend on each other. Except for the mentioned case of the taskrunner that must be added last. |
This does seem like a reasonable idea to me. FWIW, I've tried making this change on the snapd codebase, and the |
The linked "solely via the state" quote seems to relate to pebble/internals/overlord/stateengine.go Lines 58 to 61 in 35e2b7c
Of course, there could be some hidden assumptions and code relying on the current |
@thp-canonical I'm not opposing the order's correctness but rather changing it and introducing such coupling that you have to depend on order in addition to already existing dependency on the taskrunner's order. |
I'm not against this. snapd is quite deliberate in the order in which it initializes managers, I don't think we do anything that requires precise ordering of Stops, but as I said it shouldn't be a problem either. You'll have to write precise tests though to make sure your managers are initialized in the order you need, as otherwise it seems this can get fragile. As people have mentioned the original idea was that managers would be orthogonal to each other, in practice some managers depend on others and that is setup by passing one to the other at initialization but then as you have seen the StateEngine is not aware of these except indirectly by seeing the AddManager order. |
Okay, with Samuele being "not against this", I'd like to merge this. Managers shouldn't generally be dependent on each other, but stopping them in reverse order definitely seems least surprising (I've added a comment to that effect). I've also modified the code to directly iterate through the slice in reverse, rather than using defer in a loop to do the reversal (that is cute but seems less obvious to me). As a drive-by I tweaked the (existing) debug log message to use our usual formatting: start with a capital letter, full phrases/sentences. Ideally after we merge we can put up a similar PR against snapd. Any final thoughts before I merge, @Ardelean-Calin or others? |
Sorry for missing this thread! I don't have any more comments, will take care of the snapd tests. 😀 Thanks @benhoyt for the additional changes and feedback! |
Currently managers are started and stopped in a FIFO (first in, first out) order, but I would expect managers to behave more like a stack, in that they are started and stopped in a LIFO (last in first out) order. So the stopping of the managers should be done in reverse.
See this image:
Why? Because what if manager M2 started after manager M1 depends on some functionality from M1? Stopping M1 before stopping M2 might lead to unexpected and undefined behavior.
I chose to implement this behavior with the
defer
keyword as it behaves in a LIFO manner, but this could also be done by reverse iterating over the slice. I found thedefer
approach to be more concise, but feel free to correct me.@benhoyt could you please take a look at this and advise me if I'm missing something?