-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added "planned unit count" to model. #597
Conversation
This is the simplest possible implementation of goal state, designed to give folks a way to access goal state info, without implementing a more complete representation of goal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but some minor changes requested (open to discussion, though :))
So the main caveat for "adding GoalState to the model" is that Gustavo felt
that what Juju was exposing with Goal State (especially wrt the unit names,
etc) was inappropriately 'peeking into a potential future that may never
exist'.
Having the number of units (planned_unit_count) was much more cohesive and
addressed the very real user need, without exposing too many details. (The
key one being "am I an HA cluster and should wait for quorum, or am I
standalone and should just get started")
The issue being that if you 'juju deploy ha-app -n 3", each unit comes up,
and until it has gotten to start, it hasn't joined its relations, and it
*doesn't* see and *doesn't* show up to the other units. So each unit sees
itself as a standalone.
Obviously there were other discussions around goal-state wrt relations,
etc, which is why that data was exposed originally. But I'll also note that
we never 'completed' that work, as there isn't a hook that fires when
goal-state changes (has the related unit become unblocked?). Which means
you can't actually reliably trust those fields.
I think we wanted to be conservative with the initial implementation and
not yet expose the expected number of remote units, nor their state. And
focus on the first necessary step. And charms that *need* to know about the
remote unit count can use the local information to put that into relation
data, which ensures that events get triggered for it.
…On Thu, Sep 2, 2021 at 10:38 AM Jon Seager ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ops/model.py
<#597 (comment)>:
> @@ -176,6 +176,18 @@ def get_binding(self, binding_key: typing.Union[str, 'Relation']) -> 'Binding':
"""
return self._bindings.get(binding_key)
+ def get_planned_unit_count(self) -> int:
Hmm. Interesting for a couple of reasons:
-
Each unit has a status, meaning we can easily derive planned_units and
pending_units, where planned units is len(self._units) and pending
units is something line len([u for u in self._units if u["status"] !=
"active"), right?
-
I'm still unconvinced this is a model level construct, and not an
application level construct, precisely because you can't interrogate the
goal state of *another application*, only yourself if that makes sense?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABRQ7OFOY4TB7USLSSTLP3T76D5PANCNFSM5DHJHE3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@jameinel Yeh I think any reference to relations/remote units should be kept out of scope for now. |
My impression was that planned_unit_count is for reducing/preventing an "event storm", and was not intended to provide a reliable facility for executing some code only once. iiuc, the same charm code should run fine and produce the same result eventually, with or without relying on planned_unit_count. |
Code should perform correctly. The most common failure without it is a unit
coming up thinking that it needed to initialize a node only to find out
later that it actually needed to be part of an HA system. They should,
certainly, eventually become correct either way, but you want to avoid
telling other charms to start sharing their data if you don't have
quorum (I believe rabbitmq fits here).
Ceph is a different example where it cannot do a good job of changing
layout on the fly. And *does* need to know the final layout.
…On Thu, Sep 2, 2021 at 11:19 AM sed-i ***@***.***> wrote:
The issue being that if you 'juju deploy ha-app -n 3", each unit comes up,
and until it has gotten to start, it hasn't joined its relations, and it
*doesn't* see and *doesn't* show up to the other units. So each unit sees
itself as a standalone.
My impression was that planned_unit_count is for reducing/preventing an
"event storm", and was *not* intended to provide a reliable facility for
executing some code only once. iiuc, the same charm code should run fine
and produce the same result eventually, with or without relying on
planned_unit_count.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABRQ7KXYKLQ6J3QSIQIN5LT76IZHANCNFSM5DHJHE3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Makes sense for a startup sequence, but wouldn't |
get_planned_unit_count -> planned_units. Moved where we expose this to the Application class.
Hat tip to justinmclark's earlier PR, where he figured out the best way to test this.
It is, indeed, the case. With the caveats:
1) While the app is running, reconfiguring can be done, but it may also be
expensive
2) Scale is actually relatively infrequent, so 'quick' charms are likely to
not spend a lot of developer effort to handle all edge case
3) The biggest problem is the case where all 3 units are starting, but
consider themselves to be the only unit, so you have a period of time where
you are, essentially, in split-brain mode without realizing it. This case
is quite different from scale. It somewhat is because Juju doesn't expose
other units to you until you have 'joined', and then we expose them
one-by-one, rather than exposing the most up-to-date-information at each
point. But doing it the other way would lead to other issues about having
to evaluate too much of the model, and worrying about everything all the
time, instead of just incremental changes.
…On Thu, Sep 2, 2021 at 3:12 PM sed-i ***@***.***> wrote:
you want to avoid telling other charms to start sharing their data if you
don't have quorum
Makes sense for a startup sequence, but wouldn't juju scale-application
still have the same challenge?
So given the charm must be robust enough to handle incremental
scale-application, it seem like planned_unit_count is just for reducing
the "event storm". Am I missing something?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABRQ7KMGEN4E7VH2FJZ6ATT77EC5ANCNFSM5DHJHE3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Once you say "hey, I'm ready!" from Elastic or Cassandra or whatever to some other application, that application will probably start trying to create keys/indexes/tables/whatever, and re-partitioning the data is an expensive process which may or may not block off or dramatically slow external communications, with the admin guide suggesting that you set up a maintenance window for it. Now your "other" charm fails startup and reports an error back to Juju because something went wrong which never should have started in the first place. Similarly, adding an OSD to Ceph requires adding things to the keyring, adding it to the CRUSH map, and waiting for data to rebalance. This is expensive. If you wait until everything is ready, no rebalance. Or even if you wait until there's very ittle/no data, it's a cheap/fast operation. Telling Cinder/Glance/Swift that Ceph is ready cascades if you have a single node. Now, hypothetically, Swift comes up and thinks it's ready. And something which is related to Swift starts writing data. And it's a bottleneck/race between "how fast can application X which is writing data to Ceph through Swift perform when Ceph is trying to add a node and rebalance". Whether or not a charm can handle |
On Fri, Sep 3, 2021 at 6:13 PM Ryan Barry ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In ops/model.py
<#597 (comment)>:
> @@ -176,6 +176,18 @@ def get_binding(self, binding_key: typing.Union[str, 'Relation']) -> 'Binding':
"""
return self._bindings.get(binding_key)
+ def get_planned_unit_count(self) -> int:
Coming at it from another angle, I saw the "goal" of goal-state as "I'm
going to be HA and I should be ready for that".
For the Juju core goal state of "a mysql, a webserver", the *overall*
state of the model probably makes sense. From an OF perspective, there are
a couple of cases. They're all different from an application design
perspective, and we cannot plan for all of them, but let's say we have the
following cases:
- A Grafana 'cluster' which will want to know its end goal state so it
does not bother initializing data in a local sqlite database, then
immediately needing to shut down and migrate everything over to MySQL.
Grafana needs to wait for a relation to MySQL in any case, which can be
handled from the charm code itself, but instead of checking "do I have a
relation to a mysql || do I have peer relations" and branching the logic
out, it can interrogate "do I have more than one unit planned? If so, just
wait on DB initialization until I have a MySQL"
So the 'relation-created' hook was introduced to handle this particular
case. "Let me know that there *is* a relation to a database, even if the
database isn't up and running yet". 'relation-created' triggers just after
'install' so you can be informed very early on to expect that there will be
a relation.
- A Ceph cluster which "needs" to know how many initial nodes it
should have. Sure, Ceph *can* scale up and down, but it requires a
bunch of twiddling to add/remove OSDs on the fly, and waiting for monitor
initialization until all of the units are present is much easier.
- A Cassandra cluster which "needs" to know how many nodes will be
present before initialization so data can be appropriate sharded. Same
basic case as Ceph.
- A etcd cluster or anything which uses Raft which won't reach quorum
without a minimum number of nodes
This is an interesting one, as you can imagine that a good charm should be
able to initialize with a single node, and then grow the configuration to
whatever HA shows up. Having that ability also implies that it can handle
day 2 operations when you need to take a node out of the cluster, or scale
up from 1 to 3 to 5 over time.
Goal-state was somewhat designed to handle this case, but I think it
actually makes more sense to just use 'is_leader' instead. If you are the
leader, and you don't see any other nodes, that's fine, initialize with
N=1. When you see the other nodes, add them to the cluster. Raft has it as
an explicit config change that is coordinated with quorum of the current
cluster, but you should always be able to add/remove 1 node at a time.
I think the problem we saw in the past was actually because they weren't
looking at is_leader, and so each unit saw themselves as a single node
deploy.
- A RabbitMQ cluster where node start ordering matters if the entire
cluster goes down (this really doesn't apply to k8s charms, but still)
- Graylog "wants" both ElasticSearch and MongoDB, both of which "want"
to have at least 3 nodes before they are "ready"
All of these cases are slightly different, but all align on a couple of
points:
- Communication about when the overall application is "ready" to
communicate to other charms to reach the *Juju* goal state rather than
the *charm* goal state can be done by setting relation data when it is
"ready"
- For charms which support iterative startup (like Rabbit through
rabbitmqctl), this can be performed over peer relations, and, again,
readiness communicated to external relations via relation-set
Yes, goal state can help avoid the "event storm" Leon referenced, but
primarily, from my POV, by having whatever charm depends on its relation
counterpart (be that Graylog waiting for an Elastic and Mongo, Grafana
waiting for a MySQL) put itself into WaitingStatus. I used these two as
examples because Graylog requires multiple external relations to achieve
their goal state before it can be initialized, and Grafana needs to wait
for a single one (assuming a single MySQL).
From this perspective, we can avoid an "event storm" by, simply, not
dumping a bunch of relation data over to external relations until
quorum/initialization is complete. That's really up to charm authors, since
they know when application X or Y is "ready" to talk to the outside world,
and charms which require quorum before they can operate shouldn't be
blindly firing off relation data on startup/relation-joined without
checking whether they're operational first.
From an OF perspective, it's not reasonable to provide a complete
abstraction for everything the entire application *model* may need, but
we *can* provide an abstraction for what a *single* application may need.
Whew, that got long. But in the end, I agree that this should probably be
on Application for the reasons above.
I feel like we definitely got to the same place. Ceph seems to be the case
that actually needs the unit count known a-priori. Most of the others can
actually leverage leadership to make sure they don't get initialized in
split-brain, and then use the relation-joined hooks to grow the cluster.
And for ones that want to avoid the 'use sqlite instead of remote SQL db',
they have relation-created.
John
=:->
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#597 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABRQ7NCQMOVYJH5XEE624DUAFB7HANCNFSM5DHJHE3Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Just some comments around comments and concepts:
ops/model.py
Outdated
planned unit count for foo would be 3. | ||
|
||
We deliberately do not attempt to inspect whether these units are actually running | ||
or not. That is a task left up to the future, when goal state is more mature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to not refer to "goal state" here. We want to kill that idea altogether in juju, in the sense that the number of pending units is just part of the current state like everything else.
ops/model.py
Outdated
def planned_units(self) -> int: | ||
"""Count of "planned" units that will run this application. | ||
|
||
We use goal-state here, in the simplest possible way. When we implement goal state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
ops/model.py
Outdated
Includes the current unit in the count. | ||
|
||
""" | ||
goal_state = self._run('goal-state', return_output=True, use_json=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be worth adding a comment before that line so future readers can understand that perspective. Something along the lines of:
# goal-state as a concept in juju is dying in favor of it being simply the current state,
# so we must not use this API further outside of explicitly designed and agreed cases.
Makes it clear that goal state is deprecated.
Nice work, thanks @petevg |
This is the simplest possible implementation of goal state, designed
to give folks a way to access goal state info, without implementing a
more complete representation of goal state.