Rollouts for Porch #3688
Labels
area/site
enhancement
New feature or request
triaged
Issue has been triaged by adding an `area/` label
We have an early POC for doing rollouts of porch packages with the RootSyncDeployment and RootSyncRollouts controllers. But there are still several open questions about how it should work for more realistic use-cases and it is still missing several important features.
Where does the rollout functionality live?
One important aspect of the rollout solution that we are still discussing, is where the rollout should happen. It might be that there isn't one way for every situation here.
The current POC assumes that the packages and the revisions are all available before the rollout starts, and the rollout controller sits between the deployment repository and the clusters. This means that creation/generation of packages and revisions are independent of the sequence of the rollout.
Another alternative is to handle the rollouts as part of creation/generation and publishing of packages and revision. This means that the rollout will happen as a progressive publishing of packages and revisions in the deployment repository and any changes to the deployment repository will always be immediately synced to the clusters. This solution is described in some more detail in #3455 (comment), but even this solution requires a separate component to create/update/delete RSyncs in the clusters and possibly some selector to specify which packages should go to which clusters. Nephio has a deployment repo per cluster that avoids this issue, but that might not be the right solution for all use-cases. In particular, that approach isn't feasible for oci.
Related issues: #3387, #3348
Handling variants
The current POC assumes that the exact same version of the package should go into every cluster. This makes things much simpler, but in reality many use-cases require separate variants for different clusters. But we want the rollout to happen across all variants of a package, so the rollout engine will have to be able to understand which deployment packages are variants of the same source package and which are different. The rollout engine also needs a way to know which variant should go into each cluster, while maintaing a simple and concise API for cluster targeting.
This is related to how the variants are generated in the first place, as well as the previous section about where we put the rollout logic.
Related issues: #3347, #3488
Deletion of packages
As part of a solution for rollouts and declarative cluster targeting, we also need to make sure there is a way to remove packages from clusters. The approach for this is probably different depending on where in the stack we perform the rollout. If it happens between the deployment repository and the cluster, it should be pretty straighforward as we just need to make sure the cluster is no longer targeted by the necessary selector, although this might be more difficult in practice depending on how users specify the selectors. If the rollout happens earlier, we need to determine if deletion requires the package to be removed from the deployment repo, and if so, how should it work (for example, avoid situations where only the latest revision is removed and we essentially end up with a rollback on the cluster).
Auth
The POC doesn't support configuring Config Sync to use any other auth mechanism than "none". This is obviously not sufficient for most cases. On the other hand, syncing credentials to clusters with RootSyncSet also seems questionable. So we need to decide how we want to handle private repos with rollouts, possibly by only supporting a subset of the auth options available in Config Sync.
Support for different RSyncs
The POC only supports installing packages through RootSyncs with the source being git. But we also want to be able to support RepoSyncs, oci upstream, and other configuration options for the RSyncs. Longer term, we might also want to be able to support other sync backends than Config Sync, such as Argo. We need to figure out how to make controllers extensible in this dimension, while at the same time not make it too complex.
Support for fleet and Gateway Connect
Currently the implementation in RootSyncSet connects directly to the APIServer of the clusters. This doesn't work for private clusters, so we want to instead connect to clusters through Gateway Connect.
We also currently rely on clusters being represented as Config Connector resources within the same cluster as the rollout controllers. This doesn't work for all situations, so we want to be able to discover clusters through fleet as well.
Propagation of status for determining whether a rollout is successful
The POC only considers the sync status from Config Sync when determining if a package has been successfully updated. This is a start, but in practice we need more. I think we need three types of statuses:
We need to consider how we gather this information from multiple clusters.
Related issues: #3543
The text was updated successfully, but these errors were encountered: