Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .spec.version #142

Merged
merged 12 commits into from
Apr 14, 2023
Merged

Conversation

perdasilva
Copy link
Contributor

@perdasilva perdasilva commented Mar 28, 2023

Adds .spec.version property to the Operator CRD. Some small changes were made to the resolver and top level variable sources to accommodate for this change and, hopefully, future ones. When reviewing, we should be asking ourselves how easy it would be to add the next property.

This PR also adds a small admissions unit test, so we can validate the OpenApi validation rules we apply using the kubebuilder annotations.

I've also nerfed the stackdumps down to panic. Otherwise we just get logs pointing to internal controller-runtime code...

@perdasilva perdasilva changed the title Operator crd version Add .spec.version Mar 28, 2023
@@ -27,6 +27,11 @@ type OperatorSpec struct {
//+kubebuilder:validation:MaxLength:=48
//+kubebuilder:validation:Pattern:=^[a-z0-9]+(-[a-z0-9]+)*$
PackageName string `json:"packageName"`

//+kubebuilder:validation:MaxLength:=64
//+kubebuilder:validation:Pattern:=^(\|\||\s+)?([\s~^><=]*)v?(\d+)(\.(\d+))?(\.(\d+))?(\-(.+))?$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this match the full set of (but not more than) version ranges supported by semver.ParseRange?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping for two things:

  1. We support the full syntax of semver.ParseRange
  2. We block invalid ranges at admission time, and we never return an error from semver.ParseRange at reconcile time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFACT, this doesn't support the full syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semver package explicitly mentions no regex... Where did that one come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://ihateregex.io/expr/semver/ then add the comparison operators

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it's not correct as-is, and needs to account for the range prefix elements.

Copy link
Member

@joelanford joelanford Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, based on the milestone 3 consensus in #162, it sounds like we're going with spec.version and a single explicit version for now. So I think we can literally use the linked regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our upstream convo yesterday, I've left the name as "version" (I'm also happy with PackageVersion, if that's prefered - I personally like the succinctness of version. I've added some comments with examples and a link to the semver.org site.

var pkgName string
BeforeEach(func() {
By("initializing cluster state")
pkgName = fmt.Sprintf("non-existent-%s", rand.String(6))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a package name that does exist, but one that doesn't have a bundle in this range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to address this before we can merge. I'll come back to this in a bit. I need to pick up my son from footy camp =P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha now. I've addressed it. Changed the name of the package and put a comment to say the version of the package is doesn't exist

@joelanford
Copy link
Member

I've also nerfed the stackdumps down to panic. Otherwise we just get logs pointing to internal controller-runtime code...

Typically those logs contain the error message our reconciler returns (if we return a non-nil error). Do those errors still come through, and we're just no longer also barfing out the stack trace? If so, 👍

Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
@joelanford
Copy link
Member

Just as an update, we discussed the tradeoffs of versionRange, version, versionConstraint and -- for now -- landed on spec.version and it being optional and a single version (no range). The semantics are hopefully implicitly defined by the demo script in #162.

func (o *OLMVariableSource) requiredPackageFromOperator(operator *operatorsv1alpha1.Operator) (*required_package.RequiredPackageVariableSource, error) {
var opts []required_package.RequiredPackageOption
if operator.Spec.Version != "" {
opts = append(opts, required_package.InVersionRange(operator.Spec.Version))
Copy link
Member

@joelanford joelanford Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva given the direction set out in #162, we're saying that only individual versions will be allowed in the API. However, an individual version is a valid range, and I am still of the belief that we'll want to give users the ability to specify a range at some point.

So my advice is:

  1. Let's make the spec.version regex permit only individual versions
  2. Let's keep InVersionRange as the under-the-hood implementation of the constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with everything you said. The original regex was for a single version (got my wires crossed - managed to get a pretty decent one for ranges, but I don't think there's a regex that matches all cases - it could be that we'll need to match most but still verify under the hood. ChatGPT ftw XD

Copy link
Contributor Author

@perdasilva perdasilva Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also nerfed the stackdumps down to panic. Otherwise we just get logs pointing to internal controller-runtime code...

Typically those logs contain the error message our reconciler returns (if we return a non-nil error). Do those errors still come through, and we're just no longer also barfing out the stack trace? If so, +1

Just to sanity check, I reverted the changes locally to test, before:

1.6813110527752454e+09  ERROR   Reconciler error        {"controller": "operator", "controllerGroup": "operators.operatorframework.io", "controllerKind": "Operator", "Operator": {"name":"operator-sample"}, "namespace": "", "name": "operator-sample", "reconcileID": "995fa1ab-8119-4059-9e3c-7ccfe06fbedf", "error": "package 'prometheus' at version '0.33.0' not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /home/perdasilva/repos/perdasilva/operator-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /home/perdasilva/repos/perdasilva/operator-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /home/perdasilva/repos/perdasilva/operator-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234

After:

1.6813111185853214e+09  ERROR   Reconciler error        {"controller": "operator", "controllerGroup": "operators.operatorframework.io", "controllerKind": "Operator", "Operator": {"name":"operator-sample"}, "namespace": "", "name": "operator-sample", "reconcileID": "c4b4e0ff-a639-440e-bc95-94d21675aca1", "error": "package 'prometheus' at version '0.33.0' not found"}

So, the error is still there. But, the one with the stacktrace isn't particularly helpful. Should I keep this change? We can always change it back later. Rn, this feels much cleaner.

… annotations

Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
@perdasilva perdasilva force-pushed the operator_crd_version branch 2 times, most recently from f160ce9 to d1b5345 Compare April 13, 2023 08:01
Signed-off-by: perdasilva <perdasilva@redhat.com>
@perdasilva
Copy link
Contributor Author

I've gone ahead and made the semver validation more robust. I also decided to add validation in the reconciler in case the regex fails. I haven't managed to find a combination that does. But, as soon as we release someone will XD. I've made the validation additive, so we can just add new validation functions.

//+kubebuilder:validation:MaxLength:=64
//+kubebuilder:validation:Pattern=^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-(0|[1-9]\d*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*))*)?(\+([0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*))?$
//+kubebuilder:Optional
// Version is an optional is semver constraint on the package version. If not specified, the latest version available of the package will be installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: grammar:
Version is an optional is semver constraint

@@ -39,6 +39,16 @@ spec:
maxLength: 48
pattern: ^[a-z0-9]+(-[a-z0-9]+)*$
type: string
version:
description: "Version is an optional is semver constraint on the package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same grammar nit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed ^^ nice catch. I also modified the "Example" because the rendering in the description looked funky

Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
Signed-off-by: perdasilva <perdasilva@redhat.com>
Copy link
Member

@m1kola m1kola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing this one. Just a few questions so far

}
for _, validator := range validators {
if err := validator(operator); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it is ok as there is only one field which requires validation. Just curious how you see UX for it once we have more validators - Will we bereturning one error at a time or use some sort of aggregated error (so we surface all issues at once)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great call out. Personally, I think we should try as much as possible to add all of the validation errors as conditions. It would suck to keep going back and forth trying to get it right. What I'm not sure of right now is if it's "acceptable" to add multiple conditions of the same type (I don't see why not, but I don't know what the best practice is here). According to my good friend ChatGPT:

It is generally best to keep the status of a Kubernetes resource concise and clear. If you have multiple conditions of the same type, it might be better to consolidate them into a single condition that provides an accurate representation of the overall state of the resource.

Here are some guidelines to consider when working with conditions in the status of a Kubernetes resource:

Each condition should have a distinct type that describes the aspect of the resource being observed.
Conditions should be used to provide insights into the state of the resource and any issues that might be affecting it.
Use the Reason and Message fields within a condition to provide more detailed information about the state of the resource and any issues.
Update conditions as the state of the resource changes, ensuring that the status stays up-to-date and accurate.
By following these guidelines, you can maintain a clear and concise status that communicates the state of the resource effectively. In cases where you need to represent multiple aspects of the resource's state, it's better to use separate conditions with different types. However, if you have multiple instances of the same type of condition, consider consolidating them into a single condition with a clear reason and message.

It seems to suggest we should try to group everything into a single condition. But that seems ugly at first sight. Another option from the top of my head might be to create a condition for each validation error. That also seems ugly. Personally, I'm in favor of just having multiple conditions of the same type each describing its own reason for failure.

I suggest we leave it as it is for now. I will add a comment and a ticket so we can follow up on this after a more thorough discussion upstream. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have definitevely seen conditions like ValidSomething true/false with message containing concatenated errors. I think it is very common in OCP. But I also seen condition per issue approach.

IMO - if condition is meant to be consumed by humans - we might just want to concatenate errors. We will later be able to split them into separate conditions. I personally not a fan of scrolling through 10s of conditions: I would rather see all validation errors in one place.

However if we want to programmatically consume these conditions or want to give our users this ability - then it will definitevely be easier to consume if we have conditions for each issue. But in this specific case I do not see a lot of value in this feature (but I'm happy to be proven wrong).

I suggest we leave it as it is for now. I will add a comment and a ticket so we can follow up on this after a more thorough discussion upstream. wdyt?

Agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition type needs to be unique in the list of conditions.

Type and reason are part of the API so once we GA, those can't be removed or changed.

My recommendation is generally to keep the number of types and reasons as small as possible based on what clients specifically require.

In this case, I'd start with whatever type we're already using, maybe even use the existing failure reason, and then put the validation errors concatenated in the message.

But +1 to capturing this in a separate issue.

"github.com/operator-framework/operator-controller/controllers/validators"
)

var _ = Describe("Validators", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something - why not to use simple unit tests here? Looks like a classic use case for a table driven test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to go for ginkgo for the unit tests. However, I'm starting to think it might have been a mistake. They aren't as easy to execute as go tests, and carry the cruft of the *_suite.go files. From my perspective at the moment, I'm not a huge fan of table driven tests. I think they do have a nice additive property to them and can make it easier to change many tests at once. But from previous experience (which is just anecdotal at best) they carry a higher complexity in trying to understand whats going later on in the project. Especially if you have massive test case structures with huge inputs. On the other hand, singular tests are easier to wrap your head around later on. However, they can carry a higher maintenance burden during refactoring/maintenance. So, I find it to be a tradeoff between readability and and maintainability (which are also interrelated XD). Now with copilot and chatGPT it's easier to write singular tests hahahah. On a serious side, I'm easy either way. I'd say we should do the following:

  • I'll create a github discussion around this so we can decide what we should do (I think it's important to have some consistency)
  • Let's leave it as it is for now, make a join decision, and we can always refactor out ginkgo (at least for the unit tests) in favor of gotests and make a decision on whether we want to do tabular or singular tests and keep that decision as a compass bearing

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ginkgo supports table driven test fwiw. Maybe not quite as straightforward as a hand-written one, but maybe a reasonable compromise?

// this validation should already be happening at the CRD level. But, it depends
// on a regex that could possibly fail to validate a valid SemVer. This is added as an
// extra measure to ensure a valid spec before the CR is processed for resolution
func validateSemver(operator *operatorsv1alpha1.Operator) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CRD regex validation fails, will the value even be set? So, is this necessary? Are you concerned about false positives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For false positives. It's in case a bad semver somehow slips through the regex. It's so complicated and hard to parse I couldn't guarantee that nothing would slip through. So, I thought it might be a good idea to be defensive, even if 99.999% of the time this code won't probably be executed.

Copy link
Contributor Author

@perdasilva perdasilva Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CRD regex validation fails, will the value even be set?

If I understood the question correctly (and please forgive me if I'm going over stuff you already know), my understanding of the way it works is: there's both client-side and server-side validation of the field value against the regex before it reaches the controller. The intent of putting that the validation is to automatically reject anything that isn't a semver so it doesn't even make it to the controller. But, after looking at the regex, I couldn't get any confidence that it would work 100% of the time. I tried to validate with different negative cases, but I couldn't convince myself that it will definitely catch everything. It could be possible that an invalid spec slips through, in which case it would be set. So, I thought it best to handle that case in code (even if it's unlikely that it will ever be executed). Hopefully, for most of the fields that come along in the API we'll be able to get away with relatively simple regexs and length limits that should be sufficient enough for us use of this controller validation sparingly.

Since I'm rubber ducking this with you now, it occured to me that should a false positive slip through, we'd get a resolution failure on trying to parse the semver. So, maybe I'm being overzealous? Though a part of me thinks that ensuring that the input is clean before reaching the resolver might be worthwhile.

I'm totally open for suggestions here. If we feel we're adding more complexity for relatively little gain, I can remove this 2nd layer of validation. wdyt?

Copy link
Member

@m1kola m1kola Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indeed feels like belt and braces, but I think I'm in favour of keeping this validation as it prevents us hittng rest of the code in case of discrepancies between the regex and the library (e.g. bugs in library, for example).

Copy link
Contributor

@tmshort tmshort Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you got the regex from a reliable source, this scenario should never be hit, but better safe than sorry. Even regex's and libraries can have bugs.

@@ -11,3 +11,4 @@ metadata:
spec:
# TODO(user): Add fields here
packageName: prometheus
version: 3.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need/want to add a comment here that this is an invalid version? And should remain an invalid version if prometheus ever gets to version 3.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - that shouldn't have been committed - I'll omit it.

It("should filter by version range", func() {
// recreate source with version range option
var err error
rpvs, err = required_package.NewRequiredPackage(packageName, required_package.InVersionRange(">=1.0.0 !1.0.0 <3.0.0"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this valid range? i.e. include 1.0.0 then exclude it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I think it's a valid range, even if it's superfluous (just collapses to >1.0.0 <3.0.0. I'll change it to !2.0.0 to avoid this confusion.

Signed-off-by: perdasilva <perdasilva@redhat.com>
@perdasilva
Copy link
Contributor Author

demo

Copy link
Member

@m1kola m1kola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my very limited experience with v1 - looks good.

@tmshort
Copy link
Contributor

tmshort commented Apr 14, 2023

The demo image...

Although it might have been easier to use a script with set -x

@perdasilva perdasilva merged commit 2f839d0 into operator-framework:main Apr 14, 2023
@perdasilva perdasilva deleted the operator_crd_version branch April 14, 2023 15:40
@perdasilva
Copy link
Contributor Author

The demo image...

Although it might have been easier to use a script with set -x

Yeh, I did that originally - but it didn't look as nice =(

@joelanford
Copy link
Member

I thing I didn't see in the demo (or maybe missed?) is what happens when you have an existing version already installed and set spec.version to a non-existent version (Items 10 and 11 from #162)?

Another thing we could show (and yes I realize this isn't technically in the scope of #162): what happens when trying to create or update the Operator with an spec.version that is not parseable semver?

@perdasilva perdasilva mentioned this pull request May 9, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants