feat(overlord): reversed order of manager stop #447

Ardelean-Calin · 2024-07-11T09:01:33Z

Currently managers are started and stopped in a FIFO (first in, first out) order, but I would expect managers to behave more like a stack, in that they are started and stopped in a LIFO (last in first out) order. So the stopping of the managers should be done in reverse.

See this image:

Why? Because what if manager M2 started after manager M1 depends on some functionality from M1? Stopping M1 before stopping M2 might lead to unexpected and undefined behavior.

I chose to implement this behavior with the defer keyword as it behaves in a LIFO manner, but this could also be done by reverse iterating over the slice. I found the defer approach to be more concise, but feel free to correct me.

@benhoyt could you please take a look at this and advise me if I'm missing something?

flotter · 2024-07-11T10:16:17Z

@Ardelean-Calin Just a note here that the state engine is cross team maintained between pebble, snapd and some other teams (workspace engineering @dmitry-lyfar), so we should make sure the changes we make is double checked with them as well, otherwise a future cross-repo sync of the state code could slip in some changes that break existing critical functioning systems.

Note that the TaskRunner state-engine entry is explicitly done last for startup, and stop today. The task runner is special in that stopping it will kill and wait for all task tombs in the system to exit (block). Inverting this order will place task kill & wait first, which means state engine managers interacting with tasks will possibly observe a slightly different pattern, where a task can die not by the manager's own intent, but by the Task runner shutting down. This could be no issue, but I am pointing this out just to say that this very trivial change could expose a different execution path for snapd / pebble on shutdown, for interaction between managers and the task runner. Probably just needs some thorough testing on existing code bases.

dmitry-lyfar · 2024-07-12T01:46:18Z

@Ardelean-Calin could you give an example of such a M1->M2 dependency that justifies the order change? My understanding is that managers interact solely via the state but do not depend on each other. Except for the mentioned case of the taskrunner that must be added last.

benhoyt · 2024-07-12T02:54:57Z

This does seem like a reasonable idea to me. FWIW, I've tried making this change on the snapd codebase, and the go test tests still seem to pass (though there were a lot of failing tests due to me not having installed a bunch of deps, but I don't think that's related). I've asked the snapd team what they think and I'll report back.

thp-canonical · 2024-07-12T05:43:42Z

My understanding is that managers interact solely via the state but do not depend on each other.

The linked "solely via the state" quote seems to relate to .Ensure() calls at runtime, not necessarily for shutdown time (this quote is from Pebble to get nice inline source snippets in this comment, but snapd has the same text, as the two codebases are related):

pebble/internals/overlord/stateengine.go

Lines 58 to 61 in 35e2b7c

 // Most of the actual work performed by the state engine is in fact done 

 // by the individual managers registered. These managers must be able to 

 // cope with Ensure calls in any order, coordinating among themselves 

 // solely via the state.

Of course, there could be some hidden assumptions and code relying on the current Stop() call order, so we have to be careful here, but in general, teardown is usually in reverse order of setup calls (compare also: C++ inheritance and member destruction; defer in Go; RAII and destruction order of local (stack) variables in C++; getting onto a bus and sitting down, followed by standing up first before getting off the bus).

dmitry-lyfar · 2024-07-12T06:02:58Z

@thp-canonical I'm not opposing the order's correctness but rather changing it and introducing such coupling that you have to depend on order in addition to already existing dependency on the taskrunner's order.

pedronis · 2024-07-23T14:30:05Z

I'm not against this. snapd is quite deliberate in the order in which it initializes managers, I don't think we do anything that requires precise ordering of Stops, but as I said it shouldn't be a problem either. You'll have to write precise tests though to make sure your managers are initialized in the order you need, as otherwise it seems this can get fragile. As people have mentioned the original idea was that managers would be orthogonal to each other, in practice some managers depend on others and that is setup by passing one to the other at initialization but then as you have seen the StateEngine is not aware of these except indirectly by seeing the AddManager order.

benhoyt · 2024-07-23T21:40:31Z

Okay, with Samuele being "not against this", I'd like to merge this. Managers shouldn't generally be dependent on each other, but stopping them in reverse order definitely seems least surprising (I've added a comment to that effect).

I've also modified the code to directly iterate through the slice in reverse, rather than using defer in a loop to do the reversal (that is cute but seems less obvious to me).

As a drive-by I tweaked the (existing) debug log message to use our usual formatting: start with a capital letter, full phrases/sentences.

Ideally after we merge we can put up a similar PR against snapd.

Any final thoughts before I merge, @Ardelean-Calin or others?

Ardelean-Calin · 2024-07-30T04:41:44Z

Sorry for missing this thread! I don't have any more comments, will take care of the snapd tests. 😀 Thanks @benhoyt for the additional changes and feedback!

feat(state): reversed order of manager stop

983539c

More direct reverse iteration; add comment; tweak log message

64e1f7a

benhoyt changed the title ~~feat(state): reversed order of manager stop~~ feat(overlord): reversed order of manager stop Jul 23, 2024

benhoyt approved these changes Jul 30, 2024

View reviewed changes

benhoyt merged commit 2aca1ff into canonical:master Jul 30, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(overlord): reversed order of manager stop #447

feat(overlord): reversed order of manager stop #447

Ardelean-Calin commented Jul 11, 2024

flotter commented Jul 11, 2024

dmitry-lyfar commented Jul 12, 2024

benhoyt commented Jul 12, 2024

thp-canonical commented Jul 12, 2024

dmitry-lyfar commented Jul 12, 2024

pedronis commented Jul 23, 2024

benhoyt commented Jul 23, 2024

Ardelean-Calin commented Jul 30, 2024 •

edited

Loading

feat(overlord): reversed order of manager stop #447

feat(overlord): reversed order of manager stop #447

Conversation

Ardelean-Calin commented Jul 11, 2024

flotter commented Jul 11, 2024

dmitry-lyfar commented Jul 12, 2024

benhoyt commented Jul 12, 2024

thp-canonical commented Jul 12, 2024

dmitry-lyfar commented Jul 12, 2024

pedronis commented Jul 23, 2024

benhoyt commented Jul 23, 2024

Ardelean-Calin commented Jul 30, 2024 • edited Loading

Ardelean-Calin commented Jul 30, 2024 •

edited

Loading