-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use collect-status #202
Use collect-status #202
Conversation
# Check open ports | ||
if self.unit.opened_ports() != set(PORTS.values()): | ||
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be maintenance status. Waiting is for waiting on related charms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Although I wonder whether this particular check should probably be removed entirely. open-port
goes through right away. Juju itself may take time to perform it, but the charm doesn't, and this status won't reflect when the K8s operation actually goes through anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on both comments :)
# Make sure the external url is valid | ||
if not _is_external_url_valid(self._external_url): | ||
# This shouldn't happen | ||
event.add_status( | ||
BlockedStatus( | ||
f"Invalid external url: '{self._external_url}'; must include scheme and hostname." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be an error status. The external url is not set manually right? So there is no admin intervention that can fix it. The fact that you say "This shouldn't happen" is likely a hint that it should be error as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can't happen, yeah, you're right. However, I believe charms can't set an error status, that's from Juju itself (if the hook can't be run, for example). So maybe if this "can't happen" you should check it further back, when self._external_url
is being set, and just raise an exception so the charm goes into error state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover from when external url was a config option. Should be removed.
if self.last_error: | ||
event.add_status(BlockedStatus(self.last_error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be error status I think because these, as far as I can tell, are miscellaneous un-handled errors.
if not _is_external_url_valid(self._external_url): | ||
# This shouldn't happen | ||
logger.error( | ||
"Invalid external url: '%s'; must include scheme and hostname.", | ||
self._external_url, | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set self.last_error
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a pretty good use of collect-status overall! Added a few comments to think about.
self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus) | ||
|
||
# Manual cleanup, due to https://github.com/canonical/operator/issues/736 | ||
self.harness.charm.unit._collected_statuses.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this part of it (and similar lines below) will be fixed by canonical/operator#1048?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
Relation, | ||
WaitingStatus, | ||
) | ||
from ops.pebble import PathError, ProtocolError # type: ignore | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
PORTS = {"api": Port("tcp", 9093), "ha": Port("tcp", 9094)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other discussion, I think you'd be better off with:
API_PORT = 9093
HA_PORT = 9094
PORTS = [API_PORT, HA_PORT]
...
self.unit.set_ports(*PORTS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
# Resource limit | ||
if not self.resources_patch.is_ready(): | ||
event.add_status( | ||
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When designing collect-status, we thought about including a prefix/component name automatically, but decided against it as status messages should be nicely-designed UI for the user. This kind of shows why: the fact that it's resource patch-related is already clear from the message part, so the [resource patch]
prefix just adds noise to the UI. Something to consider...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but in the general case we could have multiple instances of the same lib.
- Prometheus: prometheus_scrape is a requirer and provider (two different classes though)
- Kratos uses two ingress requirers
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply") | ||
) | ||
if self.resources_patch.last_error: | ||
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be:
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}")) | |
event.add_status(BlockedStatus(f"[resource patch] {self.resources_patch.last_error}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I can see why you used the prefix here, however, will the error messages from the resource patch object already include enough context to avoid the need for it, and give a nicer status message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but this would also mean we'd need to chase potential changes of the 3rd party package.
# Check open ports | ||
if self.unit.opened_ports() != set(PORTS.values()): | ||
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Although I wonder whether this particular check should probably be removed entirely. open-port
goes through right away. Juju itself may take time to perform it, but the charm doesn't, and this status won't reflect when the K8s operation actually goes through anyway.
# "pebble ready" | ||
if not self.container.can_connect(): | ||
event.add_status( | ||
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following. "to become ready" is kind of redundant.
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready") | |
MaintenanceStatus(f"Waiting for container {self.container.name!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
# Make sure the external url is valid | ||
if not _is_external_url_valid(self._external_url): | ||
# This shouldn't happen | ||
event.add_status( | ||
BlockedStatus( | ||
f"Invalid external url: '{self._external_url}'; must include scheme and hostname." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can't happen, yeah, you're right. However, I believe charms can't set an error status, that's from Juju itself (if the hook can't be run, for example). So maybe if this "can't happen" you should check it further back, when self._external_url
is being set, and just raise an exception so the charm goes into error state.
@@ -428,7 +447,7 @@ def _common_exit_hook(self, update_ca_certs: bool = False) -> None: | |||
try: | |||
self.alertmanager_workload.update_config(self._render_manifest()) | |||
except (ConfigUpdateFailure, ConfigError) as e: | |||
self.unit.status = BlockedStatus(str(e)) | |||
self.last_error = str(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. I can see why you're storing state in this attribute here (and below) in the common exit hook. However, normally you'd query in collect-status the current state of things, rather than look at previously-set state attributes. Might be tricky for this though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing @benhoyt and @dstathis!
I don't think this is ready to merge in light of the comments in the lib PR since:
- we have custom events
- multiple entry points into a lib
- we shouldn't use collect_status outside of charm.py
My impression is that those constraints prevent us from a solid collect-status pattern.
I'm going to close this for now.
Relation, | ||
WaitingStatus, | ||
) | ||
from ops.pebble import PathError, ProtocolError # type: ignore | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
PORTS = {"api": Port("tcp", 9093), "ha": Port("tcp", 9094)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
# Resource limit | ||
if not self.resources_patch.is_ready(): | ||
event.add_status( | ||
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but in the general case we could have multiple instances of the same lib.
- Prometheus: prometheus_scrape is a requirer and provider (two different classes though)
- Kratos uses two ingress requirers
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply") | ||
) | ||
if self.resources_patch.last_error: | ||
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but this would also mean we'd need to chase potential changes of the 3rd party package.
# Check open ports | ||
if self.unit.opened_ports() != set(PORTS.values()): | ||
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on both comments :)
# "pebble ready" | ||
if not self.container.can_connect(): | ||
event.add_status( | ||
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
# Make sure the external url is valid | ||
if not _is_external_url_valid(self._external_url): | ||
# This shouldn't happen | ||
event.add_status( | ||
BlockedStatus( | ||
f"Invalid external url: '{self._external_url}'; must include scheme and hostname." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover from when external url was a config option. Should be removed.
self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus) | ||
|
||
# Manual cleanup, due to https://github.com/canonical/operator/issues/736 | ||
self.harness.charm.unit._collected_statuses.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
Issue
Status updates are communicated via exceptions.
Solution
Use collect-status.
In tandem with:
last_error
public attribute, to enable charm to use collect-status observability-libs#63