Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use collect-status #202

Closed
wants to merge 4 commits into from
Closed

Use collect-status #202

wants to merge 4 commits into from

Conversation

sed-i
Copy link
Contributor

@sed-i sed-i commented Oct 13, 2023

Issue

Status updates are communicated via exceptions.

Solution

Use collect-status.

In tandem with:

@sed-i sed-i marked this pull request as ready for review October 17, 2023 00:52
Comment on lines +221 to +223
# Check open ports
if self.unit.opened_ports() != set(PORTS.values()):
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be maintenance status. Waiting is for waiting on related charms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Although I wonder whether this particular check should probably be removed entirely. open-port goes through right away. Juju itself may take time to perform it, but the charm doesn't, and this status won't reflect when the K8s operation actually goes through anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both comments :)

Comment on lines +231 to +238
# Make sure the external url is valid
if not _is_external_url_valid(self._external_url):
# This shouldn't happen
event.add_status(
BlockedStatus(
f"Invalid external url: '{self._external_url}'; must include scheme and hostname."
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be an error status. The external url is not set manually right? So there is no admin intervention that can fix it. The fact that you say "This shouldn't happen" is likely a hint that it should be error as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can't happen, yeah, you're right. However, I believe charms can't set an error status, that's from Juju itself (if the hook can't be run, for example). So maybe if this "can't happen" you should check it further back, when self._external_url is being set, and just raise an exception so the charm goes into error state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from when external url was a config option. Should be removed.

Comment on lines +244 to +245
if self.last_error:
event.add_status(BlockedStatus(self.last_error))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be error status I think because these, as far as I can tell, are miscellaneous un-handled errors.

Comment on lines +418 to +424
if not _is_external_url_valid(self._external_url):
# This shouldn't happen
logger.error(
"Invalid external url: '%s'; must include scheme and hostname.",
self._external_url,
)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set self.last_error here?

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a pretty good use of collect-status overall! Added a few comments to think about.

self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)

# Manual cleanup, due to https://github.com/canonical/operator/issues/736
self.harness.charm.unit._collected_statuses.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this part of it (and similar lines below) will be fixed by canonical/operator#1048?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Relation,
WaitingStatus,
)
from ops.pebble import PathError, ProtocolError # type: ignore

logger = logging.getLogger(__name__)

PORTS = {"api": Port("tcp", 9093), "ha": Port("tcp", 9094)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per other discussion, I think you'd be better off with:

API_PORT = 9093
HA_PORT = 9094
PORTS = [API_PORT, HA_PORT]
...
self.unit.set_ports(*PORTS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

# Resource limit
if not self.resources_patch.is_ready():
event.add_status(
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When designing collect-status, we thought about including a prefix/component name automatically, but decided against it as status messages should be nicely-designed UI for the user. This kind of shows why: the fact that it's resource patch-related is already clear from the message part, so the [resource patch] prefix just adds noise to the UI. Something to consider...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but in the general case we could have multiple instances of the same lib.

  • Prometheus: prometheus_scrape is a requirer and provider (two different classes though)
  • Kratos uses two ingress requirers

MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply")
)
if self.resources_patch.last_error:
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}"))
event.add_status(BlockedStatus(f"[resource patch] {self.resources_patch.last_error}"))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I can see why you used the prefix here, however, will the error messages from the resource patch object already include enough context to avoid the need for it, and give a nicer status message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but this would also mean we'd need to chase potential changes of the 3rd party package.

Comment on lines +221 to +223
# Check open ports
if self.unit.opened_ports() != set(PORTS.values()):
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Although I wonder whether this particular check should probably be removed entirely. open-port goes through right away. Juju itself may take time to perform it, but the charm doesn't, and this status won't reflect when the K8s operation actually goes through anyway.

# "pebble ready"
if not self.container.can_connect():
event.add_status(
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following. "to become ready" is kind of redundant.

Suggested change
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready")
MaintenanceStatus(f"Waiting for container {self.container.name!r}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Comment on lines +231 to +238
# Make sure the external url is valid
if not _is_external_url_valid(self._external_url):
# This shouldn't happen
event.add_status(
BlockedStatus(
f"Invalid external url: '{self._external_url}'; must include scheme and hostname."
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can't happen, yeah, you're right. However, I believe charms can't set an error status, that's from Juju itself (if the hook can't be run, for example). So maybe if this "can't happen" you should check it further back, when self._external_url is being set, and just raise an exception so the charm goes into error state.

@@ -428,7 +447,7 @@ def _common_exit_hook(self, update_ca_certs: bool = False) -> None:
try:
self.alertmanager_workload.update_config(self._render_manifest())
except (ConfigUpdateFailure, ConfigError) as e:
self.unit.status = BlockedStatus(str(e))
self.last_error = str(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I can see why you're storing state in this attribute here (and below) in the common exit hook. However, normally you'd query in collect-status the current state of things, rather than look at previously-set state attributes. Might be tricky for this though?

Copy link
Contributor Author

@sed-i sed-i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @benhoyt and @dstathis!

I don't think this is ready to merge in light of the comments in the lib PR since:

  • we have custom events
  • multiple entry points into a lib
  • we shouldn't use collect_status outside of charm.py

My impression is that those constraints prevent us from a solid collect-status pattern.
I'm going to close this for now.

Relation,
WaitingStatus,
)
from ops.pebble import PathError, ProtocolError # type: ignore

logger = logging.getLogger(__name__)

PORTS = {"api": Port("tcp", 9093), "ha": Port("tcp", 9094)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

# Resource limit
if not self.resources_patch.is_ready():
event.add_status(
MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but in the general case we could have multiple instances of the same lib.

  • Prometheus: prometheus_scrape is a requirer and provider (two different classes though)
  • Kratos uses two ingress requirers

MaintenanceStatus("[resource patch] Waiting for resource limit patch to apply")
)
if self.resources_patch.last_error:
event.add_status(BlockedStatus(f"[resource patch] {self.last_error}"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but this would also mean we'd need to chase potential changes of the 3rd party package.

Comment on lines +221 to +223
# Check open ports
if self.unit.opened_ports() != set(PORTS.values()):
event.add_status(WaitingStatus(f'Opening {", ".join(PORTS.keys())} ports'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both comments :)

# "pebble ready"
if not self.container.can_connect():
event.add_status(
MaintenanceStatus(f"Waiting for '{self.container.name}' container to become ready")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Comment on lines +231 to +238
# Make sure the external url is valid
if not _is_external_url_valid(self._external_url):
# This shouldn't happen
event.add_status(
BlockedStatus(
f"Invalid external url: '{self._external_url}'; must include scheme and hostname."
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover from when external url was a config option. Should be removed.

self.assertIsInstance(self.harness.charm.unit.status, BlockedStatus)

# Manual cleanup, due to https://github.com/canonical/operator/issues/736
self.harness.charm.unit._collected_statuses.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@sed-i sed-i closed this Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants