-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioned validation of referenced Pipelines/Tasks #6616
Comments
I'm not sure if this is always feasible - we validate results/params with values once they have been resolved |
I think this should be fine; I should clarify that I don't think we need to do all validation before converting; only validation of the remote Task/Pipeline spec. |
I think a bigger problem might be trusted resources: we can only validate trusted resources before defaults are set, since the task/pipeline signature is computed before defaults are set. So the order of ops ends up needing to be 1. verification 2. set defaults 3. validate 4. convert to storage api version. |
Update: I realized a new issue with local PipelineRefs/TaskRefs: A user can create a perfectly valid v1beta1 Pipeline/Task with enable-api-fields = stable, using beta features. This gets converted into the storage version of the API, and we don't know what API version it was originally created with. When swapping to storage version v1, this can break existing local PipelineRefs/TaskRefs. There are multiple ways to address this problem, each with tradeoffs.
Would love to get some opinions on people's preferred approaches! |
Chatted with @dibyom and @JeromeJu. We believe option 6 is the best, and hope to mitigate the impact with clear release notes. Options 1 and 2 treat local Pipelines differently from remote pipelines. I also don't think we should skip validation for local Pipelines, as in option 1. I think options 3 and 4 will be very difficult to explain to cluster operators. It'll be hard to understand what value of "enable-api-fields" is being applied and what happens when you change the cluster settings. We should still pursue and prioritize option 5. It's worth noting our only beta features right now are resolution, object params/results, and array results/array indexing. I think we can get 3 per feature flags done reasonably quickly. Still would appreciate more feedback! |
I’m a bit skeptical of 6 - moving users from stable to beta feature flags by default feels like it may end up causing more problems. 😅 I don't think we want to make stable opt-in, and as you pointed out if you set the stable flag you’re still stuck not able to migrate. I think we should generally prioritize compatibility of stable over beta, though I understand this is a bit of a special case since it sounds like most (all?) of our usage for v1beta1 is using beta today. For existing v1beta1 resources already under this behavior it seems fine and needed to ensure compatibility, but I’m not sure it makes sense to do for all new resources as well. |
How does 3. work for existing resources that have already been created? |
@wlynch I just want to clarify that we plan to move off enable-api-fields in favor of per-feature flags to avoid these problems-- it's just not clear how long this will take.
There's some discussion of this on #6592. The general consensus seems to be that setting beta flags by default for right now will help address confusion with migration to v1, and that the best solution is to move to per-feature flags as soon as we can. Can you elaborate on the problems you're anticipating?
This problem doesn't affect users who use enable-api-fields=stable and are using only stable features; i.e. they will not experience problems upgrading or when we swap to v1 as our storage version. This problem only affects users who are setting "enable-api-fields" to stable, but also using beta features in beta CRDs. If they are already using beta features, it's likely that they'd want to continue doing so, so we think setting the default value to beta will help mitigate the impact on these users.
I'm not sure exactly what you mean here. Is the concern that we are swapping from disabling beta features by default in GA CRDs to enabling them by default? I anticipate that most people are using beta CRDs with enable-api-fields set to stable.
I disagree for a few reasons:
If I'm misunderstanding your idea for part 3 please correct me! |
The thought was only apply it on conversion from v1beta1 -> v1. (I may be misunderstanding how CRD conversion works though, so lmk if this isn't feasible).
Yes. More specifically:
I think what I'd like to see is if we can scope the change down to only the resources it affects. 🤔
👍 This makes sense, so SGTM to not proceed with this option as-is. |
I would like to ➕ on the aim of users not to be affected nor realized for the version swap and also clarify for option #6, that the goal was also to not to have AIs for end users. Since we are now defaulting pipeline/config/config-feature-flags.yaml Line 75 in 96212a1
beta in v1 after the migration should not lead to any breaking changes as default.
For beta features ie. resolvers/resolutions, as they are defaulted to be apparent in |
I brought up one more idea to address this in today's API WG, which I think is the best option found so far. It is similar to option 1, but separates validation for beta features from the rest of pipeline/task spec validation. This is implemented in #6701. |
It seems to me that the problem comes from implicitly allowing beta features for v1beta1 resources. I think we should stop doing that and only rely on the config map setting. If the config map is set to stable, beta features won't be available, it doesn't matter the version with which a resource was originally defined. The implication of this may affect users. Once we start storing V1, users that want to continue to use beta features will have to set the config map to beta. |
@afrittoli just to make sure I understand correctly, it sounds like you're suggesting updating how we validate enable-api-fields, so that beta features cannot be used in beta CRDs when "enable-api-fields" is set to "stable"? That would solve this problem although I'm concerned about the breaking changes. There was some interest in option #3 from #6616 (comment) in API WG this week. I've opened #6716 as a draft PR demonstrating what this could look like. However, I still strongly believe that #6701 is a better approach, for reasons detailed in the PR description of #6716. |
I think this is the most sustainable option moving forward. Perhaps we could combine it with some strategy to avoid breaking existing users, e.g.:
About #6701, validation of beta features for remote resources is something we need for sure, it sounds like a bug if we are not doing that today - perhaps that could be a PR on its own. If #6701 works, I would still consider decoupling the API version from the feature version, and we could do that for future versions only, like I proposed above, to avoid breaking users. |
@afrittoli thanks for your response!
There's some discussion about this in #6592. There seems to be consensus that our current approach with enable-api-fields is confusing/causing problems, and we should decouple feature versioning from api versioning, as you're suggesting. I'm hoping to keep the discussion of how we do that contained to that issue, but the approach that has the most buy-in at the moment is 1. short term, set "enable-api-fields" to "beta" by default, and 2. move to per-feature flags. #6701 is just looking for a short term way to make progress on our v1 swap without creating breaking changes. I absolutely want to move to per-feature flags or another solution that decouples api versioning from feature versioning-- I just don't want that to block our v1 storage swap, and I don't think it should have to, since the storage swap shouldn't include user-facing changes.
It might be better to discuss this type of strategy on #6592. #6701 aims to address the issue of referenced tasks/pipelines when swapping to v1 without creating any user facing breakages, but #6592 is more geared towards the goal of decoupling feature versioning from API versioning which it seems there is consensus on.
👍 -- PTAL at #6719, which is a refactoring PR that's the first step here.
I'm not sure I understand your question exactly. The problem is that in the reconciler, we're calling TaskSpec.Validate after converting to the storage version, but once our storage version is v1, existing definitions of beta resources will fail this call to TaskSpec.Validate. #6701 moves beta feature validation out of TaskSpec.Validate, and applies it only when a v1 resource is created on the cluster or referenced as a remote pipeline, not after conversion.
100% agree! |
We currently allow users to mix API versions of TaskRuns and the Tasks they reference, and likewise for PipelineRuns and the Pipelines they reference. We convert any referenced Tasks and Pipelines into the storage version of the API.
Later, after setting defaults and performing other operations, we validate the referenced object's spec.
For Pipelines, we do this by calling
pipelineSpec.Validate
, which validates the referenced Pipeline as if its API version is the storage version.For Tasks, we do not call
taskSpec.Validate
; instead, we have customized validation functions that aren't defined in pkg/apis; e.g. ValidateResolvedTask,validateTaskSpecRequestResources
, and a few others. The only validation function used from the apis package istaskSpec.ValidateParamArrayIndex
, which again, validates the referenced Task as if its API version is the storage version. This is the cause of #6607.This presents a problem when swapping the storage version of the API to v1. If a user previously defined a v1beta1 referenced Task or Pipeline, it gets converted to v1 and validated as if it were a v1 object. However, if the referenced object used beta features, and "enable-api-fields" is set to "stable", validation fails. This means that swapping the storage version to v1 would cause PipelineRuns and TaskRuns that previously worked to fail. Example logs from #6608.
We should perform validation before we convert the object into the storage version.
Other options:
The text was updated successfully, but these errors were encountered: