Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing harness does not re-emit deferred event #392

Closed
justinmclark opened this issue Aug 25, 2020 · 11 comments
Closed

Testing harness does not re-emit deferred event #392

justinmclark opened this issue Aug 25, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@justinmclark
Copy link
Contributor

I have an event handler that should be deferring the event under certain conditions [1]. In a test case [2], I reproduced the conditions for event.defer() to be triggered but the test harness does not re-emit the deferred event.

My workaround is to manually emit the event in the testing harness with this line: [3]

[1] https://github.com/justinmclark/grafana-charm-base/blob/5d7f2a5210c27b5da6122704efaeb8e8ff6cec1e/src/charm.py#L127
[2] https://github.com/justinmclark/grafana-charm-base/blob/5d7f2a5210c27b5da6122704efaeb8e8ff6cec1e/test/unit/test_charm.py#L60
[3] https://github.com/justinmclark/grafana-charm-base/blob/5d7f2a5210c27b5da6122704efaeb8e8ff6cec1e/test/unit/test_charm.py#L84

@afreiberger
Copy link

If I'm reading this right, and my new-found understanding of defer/emit/reemit is correct, the problem you have is order of operations.

You defer the on_peer_joined, then you trigger on_grafana_relation_joined, which then pops and reemits on_peer_joined first (still having no db, and hence adding to the deferred queue again), then runs on_grafana_relation_joined, so you're actually sitting in a state of having a hook sitting at the top of the reemit queue but no hooks following it to trigger it.

I think you just need to emit an update_status call in your tests to hit the right state for the second deferral. If you want the peer check to be performed during the db relation joined process, you'll have to either add observation of on_database_joined to call on_peer_joined during that event, or add a call to on_peer_joined to the end of on_database_joined/changed, and check in on_peer_joined if you have any peer relations to respond to.

@jameinel
Copy link
Member

jameinel commented Aug 27, 2020 via email

@justinmclark
Copy link
Contributor Author

justinmclark commented Aug 27, 2020

@afreiberger I thought this might be the issue tested this in two ways:

  1. Added a print statement in on_peer_joined just to show when the event is triggered and on_peer_joined is never run after being deferred (even though the order may be incorrect).
  2. I triggered another event after the on_grafana_relation_joined event and the on_peer_joined event still did not re-fire.

This makes me think that the harness is not re-emitting the events. Regardless, I like your idea of emitting update_status.

@jameinel
Copy link
Member

For the original statement about Harness not triggering deferred events, that is true. Doing something like "Harness.add_relation()" which ends up deferred and then "Harness.update_config" won't currently fire the deferred event before processing the config-changed event.
We could certainly do so, and it is more how actual charm events would be triggered, it is more a case of refactoring how the Harness is emitting events so that we don't miss a case (eg, config-changed would fire deferred, but relation-changed wouldn't).

As for the other concern, it is still true that deferred events fire before the next event. so the order would be:

peer-relation-joined (deferred)
deferred-peer-relation-joined (deferred again)
db-relation-joined
deferred-peer-relation-joined
config-changed
etc.

Note that Juju does not make any guarantee about relation-joined ordering. It is as likely that peer-relation-joined will fire before or after db-relation-joined (I might be wrong about peer relations, but IIRC Juju is just iterating a map, which means they are not deterministic.)

Note also that "relation-joined" events often happen before a unit has been able to set any actual relation content. (Both sides join the relation, and get the opportunity to start telling the other side information.) So the real event that you probably want to deal with is "relation-changed".

To clarify, it is a matter of ordering. Once unit prometheus/0 has finished the 'start' hook, and postgresql/0 has finished the start hook, they join their respective relations. However, there is no guarantee whether postgresql/0 will see 'pgsql-relation-joined' before or after prometheus/0 sees 'db-relation-joined'. As that depends on lots of factors like what machine starts first, which one is more heavily loaded, how many other things are related, etc. So while it is perfectly acceptable to check if the data you need (like the database URL) is available during relation-joined, if it isn't present, then the recommendation is not to defer db-relation-joined, but instead to wait for db-relation-changed.

If you defer db-relation-joined, it will fire before every other hook (config-changed, peer-relation-joined, other-app-relation-changed, etc), with no expectation that the data on the relation has actually changed.

In your particular case about HA prometheus needing a database, I probably would set a BlockedStatus, but I probably would not defer peer-relation-joined. What you are really waiting on is an event that isn't peer-relation-joined, so it doesn't really make sense that you would want to be told about it yet-another-time before any other operation.

Does that make sense?

@justinmclark
Copy link
Contributor Author

@jameinel this makes perfect sense. Thank you.

@stub42
Copy link
Contributor

stub42 commented Oct 27, 2020

An explicit call to self.harness.framework.reemit() will reemit deferred events, which I'm using to confirm that non-peers are correctly deferring an event until the leader has published required data. https://github.com/canonical/ops-lib-pgsql/blob/f16114b3c8d7411c1bb15e86867f20282be34a36/tests/test_client.py#L893

@benhoyt benhoyt added the bug Something isn't working label Apr 28, 2023
@benhoyt
Copy link
Collaborator

benhoyt commented Apr 28, 2023

We should still fix this. There's a simple workaround with reemit() described above, but this problem still exists.

Be careful when fixing this, as when charm tests are calling reemit() manually we'd get a duplicate event.

@benhoyt benhoyt added the blocked Blocked on another team/project/item label Oct 5, 2023
@benhoyt
Copy link
Collaborator

benhoyt commented Oct 5, 2023

I've marked this blocked because we're thinking about deprecating event deferral more generally, so we wouldn't fix this if we do that. It's also another case of an issue where defer is causing confusion.

This is really an order-of-events problem, as John and others have pointed out, and should be solved as such.

But leaving open in the meantime, as Harness is behaving differently from prod here.

See also the Event handler dependencies issue #329.

@benhoyt
Copy link
Collaborator

benhoyt commented Mar 18, 2024

@tonyandrewmeyer is going to follow up on this issue after doing the (related) work for #736

@tonyandrewmeyer tonyandrewmeyer removed the blocked Blocked on another team/project/item label Apr 5, 2024
@tonyandrewmeyer
Copy link
Contributor

We decided against deprecating defer(), but we're not able to do the reinitialisation that #736 asked for.

Our longer term focus for unit tests is on Scenario, which already has good support for deferred events. In general, we recommend writing Scenario tests when there's a need to test behaviour around deferring events.

We're also considering (#1174) having deferred events behave more like regular (Juju) events, ie. running on a fresh charm instance (and clean framework).

Fixing this issue for events that harness itself emits would not be too complicated (and is basically the same fix as for emitting pre-commit/commit events, validating stored state data types, etc). However, to fully fix this with harness would require either users calling reemit() themselves (ie. the existing workaround) or having harness patch the emit() calls so that it does the deferred event first, for example in this sort of test:

harness.begin()
harness.charm.install.emit()
harness.charm.start.emit()
assert ...

(Or creating a new harness API, but as with #736, we'd rather focus our resources on Scenario).

Fixing the events that harness emits without changing emit generally would be more consistent with production, but less consistent across harness.

My inclination is to leave things as they are (spending the charm-tech energy on Scenario), but make documentation clearer on deferred events and how we recommend handling those (perhaps a how-to guide for deferring events that covers deferring in general as well as testing them).

@benhoyt
Copy link
Collaborator

benhoyt commented May 17, 2024

Decision is that this is pretty invasive in Harness, and we're focussing our energy on Scenario testing now. Given this issue has been open since 2020, we're going to close and focus on Scenario.

@benhoyt benhoyt closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants