Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[resource-limits] Add last_error public attribute, to enable charm to use collect-status #63

Closed
wants to merge 4 commits into from

Conversation

sed-i
Copy link
Contributor

@sed-i sed-i commented Oct 13, 2023

Issue

Status updates are communicated via custom events.

Solution

Add last_error public attribute, to enable charm to collect-status.

In tandem with:

@sed-i
Copy link
Contributor Author

sed-i commented Oct 13, 2023

Is this how you'd expect collect-status to be used in components such as libs, @benhoyt @tonyandrewmeyer?

@sed-i sed-i changed the title Use collect-status [resource-limits] Use collect-status Oct 13, 2023
@benhoyt
Copy link

benhoyt commented Oct 15, 2023

@sed-i Sort of ... though it looks like in this case the charm lib is firing a custom patch_failed event and has is_ready() and last_error methods/attributes. So you probably wouldn't actually do this in the charm lib at all. Keep that nice and stand-alone, and leave the status handling to the charm itself. So the charm would do something like this:

class SomeCharm(ops.CharmBase):
    def __init__(self, *args):
        # ...
        self.resources_patch = KubernetesComputeResourcesPatch(...)
        self.framework.observe(charm.on.collect_unit_status, self._on_collect_unit_status)

    def _on_collect_unit_status(self, event):
        if self.resources_patch.is_ready():
            event.add_status(ops.ActiveStatus())
        elif self.resources_patch.last_error:
            event.add_status(ops.BlockedStatus(self.resources_patch.last_error))
        else:
            event.add_status(ops.MaintenanceStatus('waiting for patch'))

Note that because KubernetesComputeResourcesPatch has a last_error, you could just use this directly, instead of listening to patch_failed.

That would be my suggestion, at any rate.

@sed-i
Copy link
Contributor Author

sed-i commented Oct 16, 2023

Thanks @benhoyt!
The custom event is how we currently propagate status updates. Certainly that part won't be needed with collect-status.
I tend to agree that it's probably best to have only the charm use add_status.

@benhoyt
Copy link

benhoyt commented Oct 16, 2023

One slightly unfortunate thing to note about collect-status is that your charm has to be "all in" with it. You can't really switch half of your charm to use collect-status / add_status and leave the rest using unit.status = x, because the unit.status assignment will be overwritten with the (partially-implemented) collect-status status.

@sed-i sed-i changed the title [resource-limits] Use collect-status [resource-limits] Add last_error public attribute, to enable charm to use collect-status Oct 17, 2023
@sed-i sed-i marked this pull request as ready for review October 17, 2023 00:22
Copy link
Contributor

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

I realize it's a breaking change, but wouldn't it be better to have an API like:

    def get_result() -> Success | Failure:

Where failure.message: str?

Maybe consider for the next major version bump?

@sed-i
Copy link
Contributor Author

sed-i commented Oct 19, 2023

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

Not sure what you mean. Could you point to a line in code?

@PietroPasotti
Copy link
Contributor

PietroPasotti commented Oct 23, 2023

I don't mind this approach, but it feels a bit awkward in that is_ready would return False iff a last error has been set, which means that if you get a False, then you have to go search for a last error.

Not sure what you mean. Could you point to a line in code?

Can't point to a line in the code because it's about the API it exposes.
IIUC the current API implies that, as a user of this lib, the charm would:

lib = Lib()

if lib.is_ready():
    # go about your happy path
    
else:
    # retrieve the error msg
    error = lib.last_error

Which I think is a bit awkward.
An alternative I was proposing is:

lib = Lib()

status: Success | Failure = lib.get_status()
if isinstance(status, Success):
    # go about your happy path
    
else:  # Failure
    # retrieve the error msg
    error = status.message

I realize this deviates from the nice is_ready pattern we've been gliding so far, but fact is now we're introducing a new piece of data: it's not just 'are you ready? --> YES | NO', now it's: 'are you ready, and if not, why? --> YES | NO (because)'

And if we plan to build this out to all other is_ready methods we have in our charm libs, it might be worth it to figure out a generalization-worthy pattern.

@sed-i
Copy link
Contributor Author

sed-i commented Oct 23, 2023

That is a good point.
What immediately comes to mind is the result package.

But I am not sure we want to couple readiness with errors:

  • Could there be a circumstance where we want is_ready to return True, while at the same time having an error (Blocked)?
  • last_error is set during operations, not necessarily during "get status", so we would need that variable anyway (perhaps prepend with an underscore, _last_error). Which means later on we could hide this behind a "get status" if we choose to.
  • Could "last error" and "is ready" go out of sync? last_error gets reset per charm reinit so if an issue is resolved during custom hooks, is_ready could be True, but last_error still holds the past.

@PietroPasotti
Copy link
Contributor

That is a good point. What immediately comes to mind is the result package.

Ha, never heard of that. Interesting!

But I am not sure we want to couple readiness with errors:

  • Could there be a circumstance where we want is_ready to return True, while at the same time having an error (Blocked)?

At charm level, sure, something like the 'degraded' status @jnsgruk was brainstorming about a few Pragues ago, but in this case we're talking about a pattern charm libraries, which are (for the most part, at least our libs) simple enough to either be 'ready' or 'not ready', without intermediate 'ready-but-kind-of-borky' states.

  • last_error is set during operations, not necessarily during "get status", so we would need that variable anyway (perhaps prepend with an underscore, _last_error). Which means later on we could hide this behind a "get status" if we choose to.

Fair enough, I'm game.

  • Could "last error" and "is ready" go out of sync? last_error gets reset per charm reinit so if an issue is resolved during custom hooks, is_ready could be True, but last_error still holds the past.

Which makes me think, last_error may also quickly become stale as it currently stands (independently of is_ready).

if not lib.is_ready():
    err = lib.last_error  # set because not ready
    lib.do_something_to_fix_error()
    err2 = err.lib.last_error  # still set because it's only cleared on __init__

@sed-i
Copy link
Contributor Author

sed-i commented Oct 24, 2023

Which makes me think, last_error may also quickly become stale as it currently stands (independently of is_ready).

I'm not sure we can do anything about it when the lib has multiple entry points (is_ready, on_config_changed -> patch).

Even if I give each error individual attention,

@dataclass
class LibErrors:
    obtain_limit: Optional[str]  # Error obtaining resource limit from user function
    validity: Optional[str]  # Resource spec is invalid
    config: Optional[str]  # ConfigError
    api: Optional[str]  # ApiError
    patch: Optional[str]  # Failed applying patch
        try:
            do_something()
            self.errors.obtain_limit = None
        except ValueError as e:
            msg = f"Failed obtaining resource limit spec: {e}"
            self.errors.obtain_limit = msg

They could still be stale for the same reason.

If we stop using custom events, then...

@sed-i
Copy link
Contributor Author

sed-i commented Oct 24, 2023

I closed the tandem alertmanager PR for now.
Let's try to come up with a solid collect status pattern first.
Closing this too, for now.

@sed-i sed-i closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants