Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
baremetal: Add strategy for upgrading CoreOS-based deploy image #879
baremetal: Add strategy for upgrading CoreOS-based deploy image #879
Changes from 1 commit
b79a7ba
0c77b52
81dd7e3
42ae607
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe a footnote explaining for ironic-python-agent in case somebody from the rest of openshift reads this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's not a theoretical problem, and it's much worse for baremetal than for other platforms because real hardware evolves much quicker than virtual environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The information in brackets is probably too detailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 85-94 sound like they are describing a rejected alternative, and should go down in that section of the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context here is that all of the stuff described above was ready to land in 4.9 (and in fact a lot of it did, we just didn't flip the switch in the installer or remove ironic-ipa-downloader yet), but for the fact that we don't have a way to safely upgrade existing clusters.
So from the perspective of this enhancement, the lack of an upgrade path to the work we've already done is the problem, and the proposal is the solution. We could rewrite it so that the existing work is incorporated into the proposal and the actual stuff we need input on is bumped to the 'Upgrade Strategy' section, but that doesn't seem conducive to getting the best feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
coreos-install
is our internal concept that is not defined above. Maybe "... away from using qcow2 images"?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This implies new fields that are not introduced in this spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an API change proposed here, it would be good to spell out the new field structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields are already present in 4.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are these fields being updated? Spec or Status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more natural to leave the Spec empty and update the Status to include the new default or derived values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Spec isn't empty, it's filled in by the installer. We are actually using the operator to override what the install put here. It is a bit icky, but it's a more accurate representation of what is happening (which is a bit icky).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing triggers an automated rollback, IIUC. The CVO just keeps trying the upgrade until it succeeds. @sdodson can you verify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From support perspective, it introduces a version of CoreOS that is not current and is not known. If it causes a bug in IPA (e.g. unsupported hardware), debugging it will be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change the API to make it less necessary to guess the URLs? For example, could we ask for a URL base, to point to a directory in which files with names we control should be stored? So the user would give us
https://my.private.server/openshift
and we would add something like/4.y.z/coreos.kernel
? That would avoid them having to update the URL on every upgrade, but would still let the operator report that it is Degraded if the necessary files do not exist on the server after the upgrade.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the installer's API and I think it's common to every platform, so I'm not sure that changing it would be welcomed.
In any event, that wouldn't help us with clusters that were installed in the past.
Updating the URL ourselves is easy enough, it's the user actually putting the file where we need it that's the blocker. Being slightly more strict about renaming doesn't really move the needle much imho.
Ideally the CoreOS images would be provided automatically by the same tooling that mirrors the release payload (actually ideally the image would be in the release payload), but that's not where we are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ought to be able to automate those as unit tests for the operator, right? We don't need end-to-end tests for all of those scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously we do need unit tests, but I think it would also be nice to know that all the moving parts actually work together the way we expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very slow and memory-greedy. Won't work for disconnected installs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder how disconnected installs update CoreOS (which is supposed to update itself before joining the cluster) now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow RHCOS gets written to disk, and then the system boots it from disk for the first time. At that point it upgrades the same way RHCOS gets updated in on an ongoing basis; it retrieves a particular container image from the payload that contains the current RHCOS ostree, applies that ostree to the filesystem on disk, then reboots again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was not aware that the ostree is already in the payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md