-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dogswatch: Update Coordinator (Kubernetes Operator and Friends) #184
Comments
In the context of a Kubernetes Operator: I think we'll need to pair the use of node labels applied by the service startup in addition to our annotations to scope queries and "watches" (with informers and/or listers) to these nodes. I wasn't able to get a watcher to filter well with annotations - you can do it on the client side but labels are done on the server side - so pairing with labels allows the controller to provide a selector addressing systems reporting a designated label (like |
1 & 2 (possibly 3) are definitely in the current scope of
Is this something we would be covered via "wave" information in the update metadata? (eg. #103 / bottlerocket-os/bottlerocket-update-operator#17) |
Yes! This is the intended scoping in this design - the
Yes and no, at least in the sense that I envision the policies here completely tunable by the cluster's administrators who will know their workload the best to specify a finer grain schedule. This may be a subset of the wave phase or a superset - pending discussion and incidental data structuring. I think the notion of a maintenance time-window doesn't preclude the usefulness of the waves in that the waves may still limit the extent of the rollout. The administrator's policy may cause a cluster to be even more conservative or block until a greater number of its hosts are "eligible" for update, which may or may not be desirable! Maybe these admins DO want their clusters to make as much update progress as they can during their maintenance windows for business reasons, or perhaps even they want to update a single box first with a bake time to gate the cluster-wide acceptance of an update. |
Interesting, in my initial survey I missed that the CLUO implementation relies on both labels and annotations: https://github.com/coreos/container-linux-update-operator/blob/4bb1486f482bc9c365c71e126129e806b5a0fc97/pkg/constants/constants.go#L13-L15 . The major difference between the usage in CLUO (where these are used for pinpointing peer nodes) and proposed here is that the Controller (Operator) in our case here will be initiating updates rather than broadcast+observing intent from a Pod's label and annotations. |
The DaemonSet used for |
As part of the I propose that a utility, for the near term, be stood up that provides a stable interface for updates to be instrumented of a Thar host. The set of functionality consolidates as much of the linear update process as possible while still retaining granular control of the progress made for a given update. These steps would, over in the implementation, represent a complete and total update wherein migrations are run at the appropriate stages and updates applied to their respective partitions by their authoritative tools. InterfaceN.B. Query Update state progressionTo inspect the current state of affairs, if updatectl status {
"status": "pending",
"action": "boot-update",
"id": "thar-aws-eks-x86_64-1.13.0-m1.20191006"
} Query Update Informationupdatectl list-available This is roughly exactly what's being identified for use by the update system and is augmented with metadata pertaining to the host for updates. {
"schema": "1.0.0",
"host": {
"wave": "42"
},
"updates": [
{
"id": "thar-aws-eks-x86_64-1.13.0-m1.20191006",
"applicable": false,
"flavor": "thar-aws-eks",
"arch": "x86_64",
"version": "1.13.0",
"status": "Ready",
"max_version": "1.20.0",
"waves": {
"0": "2019-10-06T15:00:00Z",
"500":"2019-10-07T15:00:00Z",
"1024":"2019-10-08T15:00:00Z"
},
"images": {
"boot": "stuff-boot-thar-aws-eks-1.13-m1.20191006.img",
"root": "stuff-boot-thar-aws-eks-1.13-m1.20191006.img",
"hash": "stuff-boot-thar-aws-eks-1.13-m1.20191006.img"
}
}
],
"migrations": {
"(0.1, 1.0)": ["migrate_1.0_foo"],
"(1.0, 1.1)": ["migrate_1.1_foo", "migrate_1.1_bar"]
},
"datastore_versions": {
"1.11.0": "0.1",
"1.12.0": "1.0",
"1.13.0": "1.1"
}
} There will need to be a unique token or composite-token (a la N-V-R - let's call it F-A-V-R for Actions to be taken on a invalidated half-committed update is yet to be defined. Prepare host for an updateupdatectl prepare-update --id $favr [--wait] {
"status": "prepared",
"id": "thar-aws-eks-x86_64-1.13.0-m1.20191006"
} Apply update to hostupdatectl apply-update --id $favr [--wait] {
"status": "applied",
"id": "thar-aws-eks-x86_64-1.13.0-m1.20191006"
} Use update on next boot (and reboot)updatectl boot-update --id $favr [--wait] [--reboot] {
"status": "bootable",
"id": "thar-aws-eks-x86_64-1.13.0-m1.20191006"
} |
This all sounds about in line with my understanding, some thoughts:
Do we think the tool would be running on its own or only be called as required?
The tool could just pass the metadata back directly I suppose, or maybe augment it as needed. Is there any extra information needed here that isn't mentioned in #103?
Not directly related but I suppose we should split out the update-image and update-boot-flags steps. In general this all looks and smells like Updog - do we have scenarios that could influence whether we would need something else? |
It would primarily be run on demand, invoked by
I've added a few keys to encourage consistent usage and interpretation of the metadata: In I think these are easily derived by the callee but shouldn't be ad-hoc interpreted by the caller without making a contract on the construction of each value ahead of time (I've opted for opaque values that are provided to "encourage" their use, nothing will stop other implementations from calculating these I suppose 😄 ).
I didn't expand on it above, but I had it in mind that the
Awesome! We should enumerate the requirements to make sure the caller has the right environment to be running the listed commands. I can think of all kinds of terrible things we can do to make it work from within the Pod's container, knowing the list of accesses and permissions needed will help us identify the right approach.
I don't think there's others at the moment - I did call out that if there's potentially undefined behavior for an update that's removed* (in whatever way that it would no longer be available or possible to continue with an update on) from the repository while we're performing these operations. The remaining aspects of integration, as they are currently known, are asks to control some settings for a cluster to enforce "policy" but that's all hand-wavy and not within the scope of the extension to |
I agree that (bar schema) these are all things that should be determined by the callee/Updog. So we may have two types of metadata, the one read by Updog, and another form that it outputs to callers, even if it is just an augmented form.
Yep that's what I'm thinking of. |
It is worth noting that this issue isn't fully resolved, there are some aspects that are still in need of development and discussion. Some are captured here and should be reviewed during those discussions. |
@jahkeup Do you think we can split out issues for what remains? Dogswatch was a huge project and this is a huge issue, and plus, we accomplished what we needed for a milestone. I think it'd be easier to track now with smaller issues (or just one if you want) that we can assign to the next proper milestone. |
Yes, there are definitely smaller issues to be had here, but really that depends largely on having a higher level discussion about the future of Dogswatch before we can get the "next steps" jotted down. I'll take some time to review the state of development & my thoughts on the path forward and work on putting together a coherent set of discussion points in a different issue. |
This is a digested design proposal for the collective "
dogswatch
" system - Thar update coordinators for workload Orchestrators.Overview / Problem Space
There is potential upgrades to be disruptive in a general purpose cluster
running Thar that is configured to use phased upgrades. This is because the host
would not coordinate to remove its workload and, instead, directly take charge
of removing itself from a compute pool. It is worth noting that this problem
isn't unique to contemporary orchestrators but also to bespoke clustered
applications alike that may have its own mechanism for scaling itself up and
down.
It is possible to reduce, or outright prevent, impact by providing the on-host
update controls to the orchestrator. The exact architecture of this would depend
on the orchestrator itself; a
k8s
example architecture is outlined below.Tools handle applying revisions in the way that their respective policies
indicate and would necessarily service as the bridge between the host - using an
interface provided by Thar - and the orchestrator - using the primitives
provided there. Collectively, the implementations are of the
dogswatch
and areintended to resemble one another as they'll be standardized around an update
interface with a small surface area.
Kubernetes
For Kubernetes, this design will closely resemble that which was used for the
CoreOS Container Linux Update
Operator taking hints and inspiration from WeaveWork's
Kured amongst other similar projects.
The primitives provided by Kubernetes naturally lead to the approach - the
Operator pattern - pioneered by CoreOS, even more naturally given Kuberenetes'
investments made since then.
Thar's Update Operator has 2 components:
dogswatch-controller
- cluster level coordinatordogswatch-node
- node level agentdogswatch-controller
The
dogswatch-controller
process watches each node for state changes and appliesadministrator supplied policy when considering the next steps to take when an
update is available and ready to be made on a node. The controller process is
dependent upon the node process to gather and communicate its needed update
metadata and state.
Kubernetes' annotation facilities are used to report state from the node (see
dogswatch-node
) informing the controller of its state. The controller utilizesthese annotations to make a coordinated decision based on configured policy
Responsibilities:
dogswatch-node
The
dogswatch-node
process handles on-host signaling regarding availabilityand application of updates. This process does not directly control its workload
nor does it act of its own accord to stop workloads that are running on the
host. This process' primary function is to communicate state and handle
well-known bidirectional indicators in response.
Policy
To start, a single policy will provided which is a configurable time-window to
execute updates. There will not be a parallelism control to begin with as the
logic should account for Pod's Replication with further configurables to tune
such consideration required by administrators.
Incremental Policies (in no specific order):
Update source/repository
Update wave/phasing
Rollback retries and failure tolerances
Rollback telemetry opt-in
Update Integration
It remains to be determined exactly what the mechanism and transport will be
to facilitate:
For contrast and consideration, the related projects have shown that the
components could:
yum update -y && reboot
logind
There's not an obvious deficiency with 2 or 3, but 1 is infeasible and out of
line with Thar's core competencies and tenants of being minimal and opaque to
its workloads. Both remaining options, 2 and 3, imply that a dedicated well
defined and stable interface be exposed for
dogswatch
.Methods (bare minimum):
Metadata:
boot data (partition details)
boot state (partition booted, update and rollback inferential data)
update data (update availability and details)
There's another issue to talk about these details and their relevance, see #184
Follow up work and questions
Cluster (by policy) configuration of updates:
The methods outlined in Update Integration offer the bare
minimum of required functionality to power
dogswatch
. Ideally, thisintegration would be extended and enriched to offer cluster administrators to
control policy at the cluster and node level by ensuring nodes are correctly
configured to participate in the cluster's update strategy. This requires
one of the follow in a future iteration:
An additional adaptation of the on-host component to provide an appropriate set
of configurables to
dogswatch-node
for it to reconfigure update settings asneeded
A
dogswatch-node
process would be given access to the Thar API socket to beable to configure and commit update settings consistent with the cluster's
settings.
The text was updated successfully, but these errors were encountered: