Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to determine the expected count for a related application (or peers) #165

Closed
majduk opened this issue Mar 4, 2020 · 11 comments
Closed

Comments

@majduk
Copy link

majduk commented Mar 4, 2020

It seems that the framework dos not have any way to read goal-state. This is required for multiple openstack charms and also MongoDB charm.

@niemeyer
Copy link
Collaborator

This omission is somewhat by design, but let me explain it more clearly:

We'd like to kill goal-state medium term, because goal state provides a confusing reality to the charm. Originally, the charm design was encapsulated from future state that could never be realized. The units in goal state, for example, may never show up, and the machinery of juju itself does not expected to be communicated with about that data (relation-get, etc). In simpler terms, goal state breaks the abstraction that was created to make code inside the unit sane.

With that said, we want to support fixing the problems that goal-state was created to address. For example, the key reason I see goal-state being used for is to anticipate the number of units for a given application, so that the charm can wait a bit longer before doing some action. This is data that I think we can easily provide without breaking the encapsulation I described earlier. We might do that as a number of pending units in a specific relation, for example.

Is that use case reflecting your needs in OpenStack and MongoDB, and would such a pending number of units API address it?

@niemeyer
Copy link
Collaborator

@majduk Polite ping. :)

@majduk
Copy link
Author

majduk commented Apr 22, 2020

Polite pong ;-)

This is partially this usage.

Goal state for MongoDB is used to obtain a list of expected MongoDB units:

    @property
    def goal_state_units(self):
        cmd = ['goal-state', '--format=json']
        goal_state = json.loads(subprocess.check_output(cmd).decode('UTF-8'))
        return goal_state['units']

This is required to build an mongodb URI which contains unit names.

Solution with number of pending units within a relation should be enough to cover that story, just the charm would need to check if there are any pendings units and build the URL when there are no more pending units.

For this particular usage a flag stating if the relation has pending units or not would suffice.

@johnsca
Copy link
Contributor

johnsca commented Apr 22, 2020

@majduk @niemeyer I still see a concern even if it's just a pending units count, since the charm will never be notified (via a hook) if pending units fail to come online, potentially leaving the charm in a waiting state indefinitely when it could otherwise go to active. Would there be a change such that a hook could be triggered if the number of expected units changes?

@majduk
Copy link
Author

majduk commented Apr 22, 2020

Charm at the moment, without a goal-state or explicit config setting does not know what is the expected count of units.
Additionaly juju does not update the expected units count as units that fail to deploy can end up being in eternal pending state.
Taking this into account how would hook trigger work?

@jameinel
Copy link
Member

One of the comments around this that came up is for things that want to do clustering, it isn't always sufficient to know the count.
In particular, for Mongodb, the way you build the peer configuration needs the individual unit names.
A count would at least let you know that you've seen enough relation-joined events to proceed with the peer configuration, but wouldn't let you write the config.

The particular problem for K8s, is that you have to set the environment variable for the pod of the names of the peers before the pods are actually started. However, I think Juju will start the correct number of unit agents (based on the user's deployment request), and you should still see relation-joined for all of the peers even if the pods themselves have not been configured.

(I'm pretty sure I've talked my way around and back to saying that count is sufficient, it is just that you have to wait to actually create the pod spec until after relation-joined has happened so that you know the names of the units to give to the pod.)

@majduk
Copy link
Author

majduk commented May 20, 2020

(I'm pretty sure I've talked my way around and back to saying that count is sufficient, it is just that you have to wait to actually create the pod spec until after relation-joined has happened so that you know the names of the units to give to the pod.)

This approach is ok for me.

@jameinel
Copy link
Member

(Copying the plan from #206)
I think we can use 'goal-state' to get the count and expose that as part of the Relation object.

Then you can use relation-created events on a peer relation to start tracking that "there should be 5 other units of this application" and as those units come up, you'll get 'relation-joined' for them, and you can use the Relation.units attribute to find the names for the units that actually start.

We could expose this as something like Relation.expected_unit_count. I'm not sure about that exact name, but that would be a start for exposing what 'goal-state' provides, but without the rest of the context of 'seeing the future that might not occur'.

There is still the problem that if some does:

juju deploy app -n 5
# one of those fails to start
juju remove-unit app/4
# no event will be given by Juju to tell you that the expected unit count is now 4 not 5
# but if they do
juju add-unit app
# to go back to 5, or to go up to 7, you'll get relation-joined events with updated counts.

@jameinel jameinel changed the title Missing goal-state Need a way to determine the expected count for a related application (or peers) May 21, 2020
@facundobatista
Copy link
Contributor

We had yet another conversation about this topic.

We will not be providing access to Juju's goal-state, mainly because probably juju will drift away from providing such an information, in favor on the more consistent information of "pending units" (which is better for this kind of distributed systems).

But we need to understand first if having "pending units" is enough or not for all use cases. Some details to take in consideration:

  • having a "pending units" count does NOT ensure that those units will be alive in the future; they may never be up, because of system resources; so the system should work ok with current available units (unless something specific forbids it: for example, you may need odd number of units).

  • there's no way to predict the names that future units will have, as those units may never appear.

  • configuring a system that needs say, 20 units, and not providing service until reaching that, is a bad model; what if resources are not enough to get 20 units? the system will lock in 19 units, all wasted without providing service because of waiting to the 20th?

For example, the operator may specify the system to scale to 15 units. The system must have odd number of units, minimum 3. So it's fine to defer initialization when having only one or two. But when having 3 it should be started. More units appear, you can configure it with 5, 7, etc. If you have 10, you need to leave it working with 9 until you get the 11st, then configure it to work with 11. But can not wait to have 15 to run the whole system, because those may never appear.

But beyond that specific model, will having a "pending units count" be enough?

If yes, we could add it to the Operator Framework fairly soon (using juju's goal-state as source, until juju provides a better/more trustable way of knowing that).

If not, let's keep talking.

Thanks everybody!!

@pengale
Copy link
Contributor

pengale commented Sep 16, 2021

I believe that this issue is addressed by #597, which means that we probably can close it.

@jnsgruk
Copy link
Member

jnsgruk commented Sep 17, 2021

Agreed. I think the crux of this conversation seems to be: planned units for a given application should be enough in most cases.

@jnsgruk jnsgruk closed this as completed Sep 17, 2021
tonyandrewmeyer pushed a commit to tonyandrewmeyer/operator that referenced this issue Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants