Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always reinitialize the charm before processing custom events #952

Closed
sed-i opened this issue Jun 14, 2023 · 9 comments
Closed

Always reinitialize the charm before processing custom events #952

sed-i opened this issue Jun 14, 2023 · 9 comments

Comments

@sed-i
Copy link
Contributor

sed-i commented Jun 14, 2023

As charm code evolves, we delegate responsibilities to charm libs.
A common use case for charm libs is relation management.
A common pattern in relation management is for a lib to observe relation events on behalf of the charm, and "convert" them to tailored custom events.

When different relation data depend on each other, we encounter situations of stale data. For example, let's take a look at the following scenario:

graph LR

ingress -->|url| scrape
tls -->|cert| scrape
Loading
# charm.py - MyCharm's __init__:

self.ingress = Ingress(self)
# also observes ingress-relation-changed etc. on behalf of the charm and emits "url-changed"

self.tls = Tls(self)
# also observes tls-relation-changed etc. on behalf of the charm and emits "cert-changed"

self.scrape = MetricsEndpoint(
    self,
    url=self.ingress.url or socket.getfqdn(),
    cert=self.tls.cert
)
# Completely misses out on the changes communicated over custom events
# MetricsEndpoint is init'ed with stale data
# Charm is at the mercy of update-status interval

Charm code misses out on the changes communicated over custom events because custom events do not result in charm re-init; they are processed already after self.scrape is init'ed with old data.

To circumvent this, we have in place various approaches:

If custom events reinit'ed the charm, then code order concerns would probably be simpler.

Related: #736

@PietroPasotti
Copy link
Contributor

PietroPasotti commented Jun 29, 2023

I'd like to add to the possible workarounds the property pattern I thought about some time ago:

At the moment our charm libs do their observer registration in init, but nobody said that's how it should be.
A more explicit solution would be:

def  __init__(self):
    # charm libs won't register any observers on init, instead...
    self.ingress.register_observers()
    self.tls.register_observers()
    self.scrape.register_observers()
    
    observe(self.ingress.on_foo, self._on_foo)
    observe(self.tls.on_bar, self._on_bar)
    observe(self.scrape.on_baz, self._on_baz)

@property
def tls(self):
    return Tls(self)
    
@property
def ingress(self):
    return Ingress(self)
    
@property
def scrape(self):
    return MetricsEndpoint(
        self,
        url=self.ingress.url or socket.getfqdn(),
        cert=self.tls.cert
    )

this would obviate some of the timing and init ordering issues as whenever you need access to scrape, ingress and tls would be reinitialized on the fly.

@ca-scribner
Copy link

To make sure I understand the issue with what @sed-i raises, the problem is that he defined self.scrape using the value of self.ingress.url at the time of init, not when it was needed, correct?

Then yeah I like @PietroPasotti's suggestion - put scrape into a property and the timing issue goes away. if scrape needed to do something expensive on instantiation maybe that's a different story, but that feels very edge-casey

@ca-scribner
Copy link

There is something to be said about intuitiveness here too though. The solution feels pretty straight forward, but the problem itself feels easy to miss. It doesn't feel intuitive that __init__ is called once between custom (and deferred?) events, but separately for juju events.

Although that really gets to how there are first-class (juju) and second-class (framework) events that sound the same but don't really behave the same. Maybe framework events shouldn't be talked about in the same way as juju events

@sed-i
Copy link
Contributor Author

sed-i commented Jun 30, 2023

@PietroPasotti this is interesting, but I think the same problem still remains even with the property pattern:

  1. self.scrape.register_observers() creates two new (!) ad-hoc instances of Tls, Ingress, which DO NOT have any observers registered (observers are registered on different instances of the Tls, Ingress).
  2. MetricsEndpoint goes on and updates reldata with stale info because ops will call ingress_rel_changed and tls_rel_changed only after MyCharm's init.
  3. To have a (new!) copy of MetricsEndpoint re-update rel-data, we would need to pass it a refresh event from the other object...

This approach makes me think about the garbage collector (yikes).

...Unless I'm missing something?

@PietroPasotti
Copy link
Contributor

yeah there are some technical challenges but nothing that a good singleton wouldn't solve. And maybe objects should be stateless too

@benhoyt
Copy link
Collaborator

benhoyt commented Jul 7, 2023

I've commented over at #736 (comment) about why I think re-initing the charm before every Juju event (deferred or otherwise) is a good idea. However, for custom events I don't think this is a good idea, or even feasible.

Custom events are emitted by the charm or a charm lib, and when emit is called, it synchronously calls the handler method. The charm instance is fixed at that point. So to do this, I think you'd have to save the state of the charm instance, reinitialise it, call the handler method, restore the state of the charm instance back to what it was, and then continue with the original hook call.

In any case, I don't think this is a good solution for the problem at hand in @sed-i's original message. Knowing that custom events are emitted synchronously during the handling of a hook (just like a Python function call), it's not reasonable to expect that those MetricsEndpoint attributes (url and cert) will be magically updated. They're just ordinary Python values (str or None) that are used during __init__ time.

I think we should be using regular programming techniques to do the updates -- any of the ones @sed-i listed in his original post, or any other techniques for propagating updating things to dependencies. I kind of like the simplicity of passing a get_url or get_cert callable to MetricsEndpoint instead of values:

self.scrape = MetricsEndpoint(
    self,
    get_url=lambda: self.ingress.url or socket.getfqdn(),
    get_cert=lambda: self.tls.cert,
)

Or you could observe the url-changed and cert-changed events and have them call new MetricsEndpoint.update_url and .update_cert methods:

def __init__(self, *args):
    ...
    self.ingress = Ingress(self)
    self.tls = Tls(self)
    self.scrape = MetricsEndpoint(self)
    self.framework.observe(self.ingress.on.url_changed, self._on_ingress_url_changed)
    self.framework.observe(self.tls.on.cert_changed, self._on_tls_cert_changed)

def _on_ingress_url_changed(self, event):
    self.scrape.update_url(event.url)

def _on_tls_cert_changed(self, event):
    self.scrape.update_cert(event.cert)

I guess you could think of the callback method as a "pull" model and the observe method as a "push" model.

In any case, I'm planning to close this as Not Planned, but I'll leave it open for a week or so for any further discussion.

@sed-i
Copy link
Contributor Author

sed-i commented Jul 7, 2023

the simplicity of passing a get_url or get_cert callable to MetricsEndpoint instead of values

It is mere coincidence that a callable worked for us in the past. There is no way of knowing under what circumstances that callback is used. What if it is used only during MetricsEndpoint.__init__? Same problem as passing values.

you could observe the url-changed and cert-changed events and have them call new MetricsEndpoint.update_url and .update_cert methods

Absolutely. A more complete example would be:

def _on_ingress_url_changed(self, event):
    self.scrape.update_url(event.url)
    self.tls.update_csr(event.url)

def _on_tls_cert_changed(self, event):
    self.scrape.update_cert(event.cert)

"Holistic on __init__, deltas on events". Sounds trivial.
But this way, users of the lib would need to have deeper understanding of how the lib works. We would need to document in the lib something like this:

# If you need to change the URL later on, use `update_url`.
# If you need to change the cert later on, use `update_cert`.
# If the URL changed and you're using TLS, be sure to call both `update_url` and `update_cert`.

Users would need to translate those 3 instructions above into observers + charm logic.
And this is for just two items in the config.

@benhoyt
Copy link
Collaborator

benhoyt commented Jul 17, 2023

There is no way of knowing under what circumstances that callback is used. What if it is used only during MetricsEndpoint.init? Same problem as passing values.
Users would need to translate those 3 instructions above into observers + charm logic.
And this is for just two items in the config.

I don't disagree that these are challenges. I see them more as charm lib documentation problems than fundamental issues.

Per #952 (comment), closing this issue as not planned, but we'll keep chipping away at #736.

@benhoyt benhoyt closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023
@ca-scribner
Copy link

I can respect the technical challenges here, but also think it leads to some unexpected problems.

I burned myself today with this because I had a charm that basically keeps track of what happened this execution with x.executed attributes on different objects. At init, everything has x.executed=False, and then the charm runs things they turn to .executed=True. This works great until a custom event triggers the same execution loop for a second time and skips everything because they're still all .executed=True.

I have a feeling there's lots of charm code written expecting a fresh instantiation of the objects on each event and they just haven't noticed the issue. Even something as simple as:

class MyCharm:
  def __init__():
    self.processed_relations = 0

  def handle_config_changed():
    for [relation in relations if has_valid_data(relation)]:
      self.processed_relations += 1
    self.emit_status()
    
  def emit_status():
    print(f"I have processed {self.processed_relations}")

Would work differently on a custom event. Not that this is necessarily a good pattern, but I don't see in docs where we set an expectation that it is a charm syntax error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants