Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling app config package in upgrade #617

Closed
joel-bluedata opened this issue Jun 9, 2022 · 2 comments
Closed

handling app config package in upgrade #617

joel-bluedata opened this issue Jun 9, 2022 · 2 comments

Comments

@joel-bluedata
Copy link
Member

joel-bluedata commented Jun 9, 2022

Part of issue #588.

It's very likely that an image change will require a corresponding app config package change.

So far the live-app upgrade testing has been 1) with kdapps that use a constant file:// URL for their config package, and 2) with kdclusters that don't use persistent storage.

We need to make sure that the upgrade process works:

  • with and without the use of persistent storage

  • not only with a constant file:// URL, but also with a file URL that varies across upgrade and with http/https URLs too

There are some interesting details to get right here, around things like:

  • allowing kdapp spec changes for the config package specifications

  • dealing with the fact that extracted script contents and any setup artifacts in the extraction directory are preserved across container restart if using persistent storage

  • making sure the correct package contents are in place after upgrade (and after rollback), especially for http/https URLs that need to be pulled -- and which would normally NOT be pulled after container restart if using persistent storage

  • properly dealing with rollback in the persistent storage case -- e.g. should we just save the old config script contents in a persistent directory during upgrade, and then reinstate those exact contents after rollback?

  • also of course in general being aware that it's OK to NOT have a config package, so the member might lose or gain a config package altogether across upgrade or rollback

(I'm marking this issue as Bug type because I think the current live-app-upgrade branch, even with its limitation that only allows fixed file URLs, still probably has issues in the persistent storage case.)

First step I think will be to recap here exactly how the current app config package process works, both on initial setup and on container restart.

@joel-bluedata
Copy link
Member Author

OK, so currently the flow for a new member goes as follows. There's also lots of logic to deal with the reconcile re-running during setup, since there can be errors to retry or long-running setup to keep checking the status of, but let's just look at the initial kickoff.

==========

*** For members first entering "creating" state:

If there's no setup package for this member's role, go immediately to "ready" state. Otherwise...

Call appConfig, which will do:

  • upload the current configmeta.json
  • install configcli if it is not yet installed
  • call setupAppConfig

In setupAppConfig, it will do:

  • bail out if "/opt/guestconfig/*/startscript" already exists, otherwise...
  • download/copy the package to "/opt/guestconfig/appconfig.tgz"
  • extract the package to a subdir of "/opt/guestconfig"
  • remove "/opt/guestconfig/appconfig.tgz"

Back in appConfig, we then run startscript for the initial config.

*** For "rebooted" members:

A member can come back into "creating" state if the pod is deleted and re-created. This could be a manual delete, or as a result of pod image changing because of upgrade. This will kick us back into the same handler as above, but it can run through a different code path.

The basic differences are that, for persistent clusters, configcli will not be re-installed (which is fine) and setupAppConfig will bail out (which is usually fine, but in the upgrade case...not).

==========

So here's the consequences as I see it.

If the config package stays null, or keeps the same URL, both before and after upgrade/rollback -- things work OK.

If the role is NOT using persistent storage, everything works OK even if the config package changes across upgrade/rollback, because the correct setup package for the app version will always be re-copied/downloaded and re-installed.

One thing to note there, BTW, is that the upgraded config package will received the "you have been upgraded" startscript event, while the old/original config package will received the "you have been rolled back" event. I think this is a good arrangement, but please think about whether there are possible problems with that. Something to make sure we document in any case.

OK now let's talk about persistent storage.

In the case of a successful upgrade, we'll still have the old config package installed (and it will received the "you have been upgraded" event). That's a bug; I'll have some broad comments about a solution in a following comment.

@joel-bluedata
Copy link
Member Author

Generally speaking I think we need to add this logic to the flow:

First of all, there's the spot where we completely avoid all of the setup logic in handleCreatingMembers when there's no config package at all. This is still fine but we should remove any existing "/opt/guestconfig" directory in that case, since we might be transitioning from an app version where there was a config package.

OK, on to the main appConfig logic.

In the upgrade case, before we call setupAppConfig (or maybe in the beginning of that function) if /opt/guestconfig has any contents then that whole directory should be moved aside as a backup of the original config stuff. To make sure we keep it in persistent storage, we could do something like move it to /opt/guestconfig.backup and add "/opt/guestconfig.backup" to the list of appConfigDefaultMountFolders.

If we're rolling back, and /opt/guestconfig.backup exists, then we should avoid the normal setupAppConfig logic. Just remove /opt/guestconfig and rename /opt/guestconfig.backup to /opt/guestconfig.

That leaves the question of what to do with /opt/guestconfig.backup when upgrade completes (for the entire kdcluster, so there is no longer a danger of rollback). IMO we can just leave it there, it's harmless... when we implement the whole-cluster-upgraded notification then the startscript could choose to remove it as a bit of housekeeping, but we shouldn't depend on that. We can just make sure in the "upgrade case" described above that we nuke any pre-existing /opt/guestconfig.backup directory.

==========

So, how would an app startscript deal with this situation?

On a "your pod has been upgraded" event, the (potentially) new startscript will get the event and has the entire filesystem available to work with, including the previous contents of /opt/guestconfig which it knows has been stashed as /opt/guestconfig.backup.

The startscript can make any changes it needs to, in case e.g. there are some configuration or format changes needed to work with an upgraded container image, but it should keep in mind that a rollback may happen and the old startscript will need to deal with it. If it is modifying files that are on persistent storage, it could for example make backups of those files. In any case it needs to follow a convention that will be expected by the old startscript. It should not store anything needed for rollback in the /opt/guestconfig directory since that will be removed on rollback.

If rollback happens, /opt/guestconfig.backup is moved back to /opt/guestconfig and the old startscript will process the "your pod has been rolled back" event. At this point it could for example restore the backups of config files, databases, whatever that the new startscript had previously made.

But if the whole kdcluster upgrade completes, the new startscript will get the (yet-to-be-implemented) "upgrade complete" notification. At this point it could delete anything that was to be used for rollback, such as /opt/guestconfig.backup and any other backups that it had explicitly created.

Seems plausible.

@Kosta91 Kosta91 self-assigned this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants