Need a way to determine the expected count for a related application (or peers) #165

majduk · 2020-03-04T10:47:47Z

It seems that the framework dos not have any way to read goal-state. This is required for multiple openstack charms and also MongoDB charm.

niemeyer · 2020-04-15T13:22:39Z

This omission is somewhat by design, but let me explain it more clearly:

We'd like to kill goal-state medium term, because goal state provides a confusing reality to the charm. Originally, the charm design was encapsulated from future state that could never be realized. The units in goal state, for example, may never show up, and the machinery of juju itself does not expected to be communicated with about that data (relation-get, etc). In simpler terms, goal state breaks the abstraction that was created to make code inside the unit sane.

With that said, we want to support fixing the problems that goal-state was created to address. For example, the key reason I see goal-state being used for is to anticipate the number of units for a given application, so that the charm can wait a bit longer before doing some action. This is data that I think we can easily provide without breaking the encapsulation I described earlier. We might do that as a number of pending units in a specific relation, for example.

Is that use case reflecting your needs in OpenStack and MongoDB, and would such a pending number of units API address it?

niemeyer · 2020-04-22T12:26:44Z

@majduk Polite ping. :)

majduk · 2020-04-22T12:49:41Z

Polite pong ;-)

This is partially this usage.

Goal state for MongoDB is used to obtain a list of expected MongoDB units:

    @property
    def goal_state_units(self):
        cmd = ['goal-state', '--format=json']
        goal_state = json.loads(subprocess.check_output(cmd).decode('UTF-8'))
        return goal_state['units']

This is required to build an mongodb URI which contains unit names.

Solution with number of pending units within a relation should be enough to cover that story, just the charm would need to check if there are any pendings units and build the URL when there are no more pending units.

For this particular usage a flag stating if the relation has pending units or not would suffice.

johnsca · 2020-04-22T16:55:41Z

@majduk @niemeyer I still see a concern even if it's just a pending units count, since the charm will never be notified (via a hook) if pending units fail to come online, potentially leaving the charm in a waiting state indefinitely when it could otherwise go to active. Would there be a change such that a hook could be triggered if the number of expected units changes?

majduk · 2020-04-22T18:41:23Z

Charm at the moment, without a goal-state or explicit config setting does not know what is the expected count of units.
Additionaly juju does not update the expected units count as units that fail to deploy can end up being in eternal pending state.
Taking this into account how would hook trigger work?

jameinel · 2020-05-13T12:25:23Z

One of the comments around this that came up is for things that want to do clustering, it isn't always sufficient to know the count.
In particular, for Mongodb, the way you build the peer configuration needs the individual unit names.
A count would at least let you know that you've seen enough relation-joined events to proceed with the peer configuration, but wouldn't let you write the config.

The particular problem for K8s, is that you have to set the environment variable for the pod of the names of the peers before the pods are actually started. However, I think Juju will start the correct number of unit agents (based on the user's deployment request), and you should still see relation-joined for all of the peers even if the pods themselves have not been configured.

(I'm pretty sure I've talked my way around and back to saying that count is sufficient, it is just that you have to wait to actually create the pod spec until after relation-joined has happened so that you know the names of the units to give to the pod.)

majduk · 2020-05-20T14:46:21Z

(I'm pretty sure I've talked my way around and back to saying that count is sufficient, it is just that you have to wait to actually create the pod spec until after relation-joined has happened so that you know the names of the units to give to the pod.)

This approach is ok for me.

jameinel · 2020-05-21T11:06:02Z

(Copying the plan from #206)
I think we can use 'goal-state' to get the count and expose that as part of the Relation object.

Then you can use relation-created events on a peer relation to start tracking that "there should be 5 other units of this application" and as those units come up, you'll get 'relation-joined' for them, and you can use the Relation.units attribute to find the names for the units that actually start.

We could expose this as something like Relation.expected_unit_count. I'm not sure about that exact name, but that would be a start for exposing what 'goal-state' provides, but without the rest of the context of 'seeing the future that might not occur'.

There is still the problem that if some does:

juju deploy app -n 5
# one of those fails to start
juju remove-unit app/4
# no event will be given by Juju to tell you that the expected unit count is now 4 not 5
# but if they do
juju add-unit app
# to go back to 5, or to go up to 7, you'll get relation-joined events with updated counts.

facundobatista · 2020-11-13T18:49:47Z

We had yet another conversation about this topic.

We will not be providing access to Juju's goal-state, mainly because probably juju will drift away from providing such an information, in favor on the more consistent information of "pending units" (which is better for this kind of distributed systems).

But we need to understand first if having "pending units" is enough or not for all use cases. Some details to take in consideration:

having a "pending units" count does NOT ensure that those units will be alive in the future; they may never be up, because of system resources; so the system should work ok with current available units (unless something specific forbids it: for example, you may need odd number of units).
there's no way to predict the names that future units will have, as those units may never appear.
configuring a system that needs say, 20 units, and not providing service until reaching that, is a bad model; what if resources are not enough to get 20 units? the system will lock in 19 units, all wasted without providing service because of waiting to the 20th?

For example, the operator may specify the system to scale to 15 units. The system must have odd number of units, minimum 3. So it's fine to defer initialization when having only one or two. But when having 3 it should be started. More units appear, you can configure it with 5, 7, etc. If you have 10, you need to leave it working with 9 until you get the 11st, then configure it to work with 11. But can not wait to have 15 to run the whole system, because those may never appear.

But beyond that specific model, will having a "pending units count" be enough?

If yes, we could add it to the Operator Framework fairly soon (using juju's goal-state as source, until juju provides a better/more trustable way of knowing that).

If not, let's keep talking.

Thanks everybody!!

pengale · 2021-09-16T21:51:55Z

I believe that this issue is addressed by #597, which means that we probably can close it.

jnsgruk · 2021-09-17T14:20:08Z

Agreed. I think the crux of this conversation seems to be: planned units for a given application should be enough in most cases.

…-get fixed juju-info network

chipaca assigned niemeyer Apr 22, 2020

jameinel unassigned niemeyer May 13, 2020

jameinel mentioned this issue May 21, 2020

can not determine the number of instances of the application #206

Closed

jameinel changed the title ~~Missing goal-state~~ Need a way to determine the expected count for a related application (or peers) May 21, 2020

facundobatista mentioned this issue Nov 13, 2020

Expose goal-state #434

Closed

jnsgruk closed this as completed Sep 17, 2021

tonyandrewmeyer pushed a commit to tonyandrewmeyer/operator that referenced this issue Sep 27, 2024

Merge pull request canonical#165 from canonical/fix-juju-info-network…

7242fdd

…-get fixed juju-info network

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a way to determine the expected count for a related application (or peers) #165

Need a way to determine the expected count for a related application (or peers) #165

majduk commented Mar 4, 2020

niemeyer commented Apr 15, 2020

niemeyer commented Apr 22, 2020

majduk commented Apr 22, 2020

johnsca commented Apr 22, 2020 •

edited

Loading

majduk commented Apr 22, 2020

jameinel commented May 13, 2020

majduk commented May 20, 2020

jameinel commented May 21, 2020

facundobatista commented Nov 13, 2020

pengale commented Sep 16, 2021

jnsgruk commented Sep 17, 2021

Need a way to determine the expected count for a related application (or peers) #165

Need a way to determine the expected count for a related application (or peers) #165

Comments

majduk commented Mar 4, 2020

niemeyer commented Apr 15, 2020

niemeyer commented Apr 22, 2020

majduk commented Apr 22, 2020

johnsca commented Apr 22, 2020 • edited Loading

majduk commented Apr 22, 2020

jameinel commented May 13, 2020

majduk commented May 20, 2020

jameinel commented May 21, 2020

facundobatista commented Nov 13, 2020

pengale commented Sep 16, 2021

jnsgruk commented Sep 17, 2021

johnsca commented Apr 22, 2020 •

edited

Loading