[resource-limits] Add `last_error` public attribute, to enable charm to use collect-status #63

sed-i · 2023-10-13T18:13:46Z

Issue

Status updates are communicated via custom events.

Solution

Add last_error public attribute, to enable charm to collect-status.

In tandem with:

Use collect-status alertmanager-k8s-operator#202

sed-i · 2023-10-13T18:14:30Z

Is this how you'd expect collect-status to be used in components such as libs, @benhoyt @tonyandrewmeyer?

benhoyt · 2023-10-15T23:11:15Z

@sed-i Sort of ... though it looks like in this case the charm lib is firing a custom patch_failed event and has is_ready() and last_error methods/attributes. So you probably wouldn't actually do this in the charm lib at all. Keep that nice and stand-alone, and leave the status handling to the charm itself. So the charm would do something like this:

class SomeCharm(ops.CharmBase):
    def __init__(self, *args):
        # ...
        self.resources_patch = KubernetesComputeResourcesPatch(...)
        self.framework.observe(charm.on.collect_unit_status, self._on_collect_unit_status)

    def _on_collect_unit_status(self, event):
        if self.resources_patch.is_ready():
            event.add_status(ops.ActiveStatus())
        elif self.resources_patch.last_error:
            event.add_status(ops.BlockedStatus(self.resources_patch.last_error))
        else:
            event.add_status(ops.MaintenanceStatus('waiting for patch'))

Note that because KubernetesComputeResourcesPatch has a last_error, you could just use this directly, instead of listening to patch_failed.

That would be my suggestion, at any rate.

sed-i · 2023-10-16T13:33:56Z

Thanks @benhoyt!
The custom event is how we currently propagate status updates. Certainly that part won't be needed with collect-status.
I tend to agree that it's probably best to have only the charm use add_status.

benhoyt · 2023-10-16T19:31:49Z

One slightly unfortunate thing to note about collect-status is that your charm has to be "all in" with it. You can't really switch half of your charm to use collect-status / add_status and leave the rest using unit.status = x, because the unit.status assignment will be overwritten with the (partially-implemented) collect-status status.

PietroPasotti

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

I realize it's a breaking change, but wouldn't it be better to have an API like:

    def get_result() -> Success | Failure:

Where failure.message: str?

Maybe consider for the next major version bump?

sed-i · 2023-10-19T21:26:27Z

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

Not sure what you mean. Could you point to a line in code?

PietroPasotti · 2023-10-23T15:08:34Z

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

Not sure what you mean. Could you point to a line in code?

Can't point to a line in the code because it's about the API it exposes.
IIUC the current API implies that, as a user of this lib, the charm would:

lib = Lib()

if lib.is_ready():
    # go about your happy path
    
else:
    # retrieve the error msg
    error = lib.last_error

Which I think is a bit awkward.
An alternative I was proposing is:

lib = Lib()

status: Success | Failure = lib.get_status()
if isinstance(status, Success):
    # go about your happy path
    
else:  # Failure
    # retrieve the error msg
    error = status.message

I realize this deviates from the nice is_ready pattern we've been gliding so far, but fact is now we're introducing a new piece of data: it's not just 'are you ready? --> YES | NO', now it's: 'are you ready, and if not, why? --> YES | NO (because)'

And if we plan to build this out to all other is_ready methods we have in our charm libs, it might be worth it to figure out a generalization-worthy pattern.

sed-i · 2023-10-23T15:46:17Z

That is a good point.
What immediately comes to mind is the result package.

But I am not sure we want to couple readiness with errors:

Could there be a circumstance where we want is_ready to return True, while at the same time having an error (Blocked)?
last_error is set during operations, not necessarily during "get status", so we would need that variable anyway (perhaps prepend with an underscore, _last_error). Which means later on we could hide this behind a "get status" if we choose to.
Could "last error" and "is ready" go out of sync? last_error gets reset per charm reinit so if an issue is resolved during custom hooks, is_ready could be True, but last_error still holds the past.

PietroPasotti · 2023-10-24T06:36:50Z

That is a good point. What immediately comes to mind is the result package.

Ha, never heard of that. Interesting!

But I am not sure we want to couple readiness with errors:

Could there be a circumstance where we want is_ready to return True, while at the same time having an error (Blocked)?

At charm level, sure, something like the 'degraded' status @jnsgruk was brainstorming about a few Pragues ago, but in this case we're talking about a pattern charm libraries, which are (for the most part, at least our libs) simple enough to either be 'ready' or 'not ready', without intermediate 'ready-but-kind-of-borky' states.

last_error is set during operations, not necessarily during "get status", so we would need that variable anyway (perhaps prepend with an underscore, _last_error). Which means later on we could hide this behind a "get status" if we choose to.

Fair enough, I'm game.

Could "last error" and "is ready" go out of sync? last_error gets reset per charm reinit so if an issue is resolved during custom hooks, is_ready could be True, but last_error still holds the past.

Which makes me think, last_error may also quickly become stale as it currently stands (independently of is_ready).

if not lib.is_ready():
    err = lib.last_error  # set because not ready
    lib.do_something_to_fix_error()
    err2 = err.lib.last_error  # still set because it's only cleared on __init__

sed-i · 2023-10-24T21:04:09Z

Which makes me think, last_error may also quickly become stale as it currently stands (independently of is_ready).

I'm not sure we can do anything about it when the lib has multiple entry points (is_ready, on_config_changed -> patch).

Even if I give each error individual attention,

@dataclass
class LibErrors:
    obtain_limit: Optional[str]  # Error obtaining resource limit from user function
    validity: Optional[str]  # Resource spec is invalid
    config: Optional[str]  # ConfigError
    api: Optional[str]  # ApiError
    patch: Optional[str]  # Failed applying patch

        try:
            do_something()
            self.errors.obtain_limit = None
        except ValueError as e:
            msg = f"Failed obtaining resource limit spec: {e}"
            self.errors.obtain_limit = msg

They could still be stale for the same reason.

If we stop using custom events, then...

sed-i · 2023-10-24T22:06:22Z

I closed the tandem alertmanager PR for now.
Let's try to come up with a solid collect status pattern first.
Closing this too, for now.

sed-i changed the title ~~Use collect-status~~ [resource-limits] Use collect-status Oct 13, 2023

github-actions bot added the Libraries: Out of sync label Oct 13, 2023

sed-i mentioned this pull request Oct 13, 2023

Use collect-status canonical/alertmanager-k8s-operator#202

Closed

sed-i changed the title ~~[resource-limits] Use collect-status~~ [resource-limits] Add last_error public attribute, to enable charm to use collect-status Oct 17, 2023

sed-i marked this pull request as ready for review October 17, 2023 00:22

sed-i requested review from Abuelodelanada, lucabello, PietroPasotti, dstathis and simskij as code owners October 17, 2023 00:22

sed-i mentioned this pull request Oct 17, 2023

Charm should be reinitialized at every hook execution in Harness canonical/operator#736

Closed

PietroPasotti approved these changes Oct 18, 2023

View reviewed changes

sed-i added 4 commits October 19, 2023 17:25

Use collect-status

21ae3fd

Add WaitingStatus

8f6a0e3

Do not use add_status in the lib

0624ee0

Set last_error

b44b502

sed-i force-pushed the feature/collect-status branch from 18ed03c to b44b502 Compare October 19, 2023 21:25

sed-i requested a review from PietroPasotti October 23, 2023 15:00

sed-i closed this Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[resource-limits] Add `last_error` public attribute, to enable charm to use collect-status #63

[resource-limits] Add `last_error` public attribute, to enable charm to use collect-status #63

sed-i commented Oct 13, 2023 •

edited

Loading

sed-i commented Oct 13, 2023

benhoyt commented Oct 15, 2023

sed-i commented Oct 16, 2023

benhoyt commented Oct 16, 2023

PietroPasotti left a comment

sed-i commented Oct 19, 2023

PietroPasotti commented Oct 23, 2023 •

edited

Loading

sed-i commented Oct 23, 2023

PietroPasotti commented Oct 24, 2023

sed-i commented Oct 24, 2023

sed-i commented Oct 24, 2023

[resource-limits] Add last_error public attribute, to enable charm to use collect-status #63

[resource-limits] Add last_error public attribute, to enable charm to use collect-status #63

Conversation

sed-i commented Oct 13, 2023 • edited Loading

Issue

Solution

sed-i commented Oct 13, 2023

benhoyt commented Oct 15, 2023

sed-i commented Oct 16, 2023

benhoyt commented Oct 16, 2023

PietroPasotti left a comment

Choose a reason for hiding this comment

sed-i commented Oct 19, 2023

PietroPasotti commented Oct 23, 2023 • edited Loading

sed-i commented Oct 23, 2023

PietroPasotti commented Oct 24, 2023

sed-i commented Oct 24, 2023

sed-i commented Oct 24, 2023

[resource-limits] Add `last_error` public attribute, to enable charm to use collect-status #63

[resource-limits] Add `last_error` public attribute, to enable charm to use collect-status #63

sed-i commented Oct 13, 2023 •

edited

Loading

PietroPasotti commented Oct 23, 2023 •

edited

Loading