Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Side Field Validation #2

Closed

Conversation

kevindelgado
Copy link

What type of PR is this?

/kind feature

What this PR does / why we need it:

Performs strict server side schema validation via the fieldValidation=[Strict,Warn,Ignore] query parameter.

This change is dependent on liggitt/json-stdlib and so this PR is opened on this fork in order to begin collecting feedback on it until liggitt/json-stdlib is merged into upstream k/k.

It introduces the fieldValidation query parameter that when set to Strict causes requests to error when unknown fields are present, when set to Warn succeeds the request but returns a warning, and when set to Ignore it ignores unknown fields.

For create/update requests, we use strict json unmarshalling to determine the unknown fields and error/warn based on those fields.

For JSON Patch requests (i.e. CRDs), we use the prune mechanism in the customresource handler to resolve any unknown fields.

For SMP Patch request, we use the the unstructured converter to resolve unknown fields.

Apply Patch requests already require strict schemas and will error on unknown fields.

Which issue(s) this PR fixes:

First step in solving kubernetes#39434 and kubernetes#5889

Does this PR introduce a user-facing change?

Performs strict server side schema validation requests via the `fieldValidation=[Strict,Warn,Ignore]` query parameter.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

KEP-2885

[KEP-2885](https://github.com/kubernetes/enhancements/issues/2885)

@kevindelgado
Copy link
Author

cc @liggitt @apelisse (I don't think I can add reviewers on a fork).

This is ready for review when y'all are free.

@liggitt liggitt force-pushed the json-stdlib branch 5 times, most recently from fbe25a5 to 434ce43 Compare October 20, 2021 17:12
@@ -855,6 +855,7 @@ var defaultKubernetesFeatureGates = map[featuregate.Feature]featuregate.FeatureS
genericfeatures.WarningHeaders: {Default: true, PreRelease: featuregate.GA, LockToDefault: true}, // remove in 1.24
genericfeatures.OpenAPIEnums: {Default: false, PreRelease: featuregate.Alpha},
genericfeatures.CustomResourceValidationExpressions: {Default: false, PreRelease: featuregate.Alpha},
genericfeatures.FieldValidation: {Default: false, PreRelease: featuregate.Alpha},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure about the name yet, what about ... StrictValidation, or both StrictFieldsValidation?

Comment on lines +1243 to 1245
return obj, gvk, err
}
return nil, gvk, err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen this type of return signature where one doesn't check for err != nil but obj != nil and I can see how it'd be confusing for people.

@@ -1321,10 +1327,12 @@ func (v *unstructuredSchemaCoercer) apply(u *unstructured.Unstructured) error {
if err != nil {
return err
}

pruned := map[string]bool{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could possibly have used an empty struct here, struct{} rather than bool, though I don't know if it matters.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also prefer var pruned map[string]... which avoids an unused map allocation

Comment on lines +1335 to 1337
pruned = structuralpruning.Prune(u.Object, v.structuralSchemas[gv.Version], false)
structuraldefaulting.PruneNonNullableNullsWithoutDefaults(u.Object, v.structuralSchemas[gv.Version])
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we're not pruning, we can't fail on unknown fields?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems correct... if we're preserving fields not mentioned in the schema, we're effectively saying all field names are permissible

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this mostly exists for compatibility reason, doesn't mean people necessarily wanted arbitrary fields? I think it's useful even as a warning for example.

scope.err(err, w, req)
return
}

decoder := scope.Serializer.DecoderToVersion(s.Serializer, scope.HubGroupVersion)
trace.Step("About to convert to expected version")
obj, gvk, err := decoder.Decode(body, &defaultGVK, original)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answers one of my question above. It doesn't look like ignore and warn are fundamentally different, do we need to keep ignore then? Or if we keep it, does it need to be the default?

return
}
if validationDirective == runtime.WarnFieldValidation {
warning.AddWarning(req.Context(), "", err.Error())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected one warning per unknown field, this looks like it would build a very large warning

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made an attempt at doing that now. Let me know if it's correct


// smpTestSetup applies an object that will later be patched
// in the actual test/benchmark.
func smpTestSetup(t testing.TB, client clientset.Interface) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of this function for tests in general

// smpTestSetup applies an object that will later be patched
// in the actual test/benchmark.
func smpTestSetup(t testing.TB, client clientset.Interface) {
bodyBase, err := os.ReadFile("./testdata/deploy-small.json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also ideally avoid having files read from disk for tests, since it makes it harder to understand what the test does.

Comment on lines +149 to +153
var testcases = []struct {
name string
opts metav1.UpdateOptions
errContains string
warnContains string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test (and another), I don't know if I enjoy the value of table-driven testing here, as I feel this obfuscates what the test does for very limited code re-use.

@apelisse
Copy link

I don't have a strong opinion about any of my comments and this is super good and could probably be merged as is (except maybe we should answer to a few TODOs ;-). Thanks Kevin!

@kevindelgado
Copy link
Author

kevindelgado commented Oct 22, 2021

I spent all of today focused on the SMP unstructured converter because that is where I felt the least comfortable and wanted to get some resolution on first.

In addressing your feedback @apelisse , I think I finally understood the "pre-flattening" algo we had discussed before that I was previously failing to grok. I've updated the converter to use this now which gets rid of the wonky "fields" field we had to add to the various method signatures that you had commented on (and clears up some of the other concerns you had).

On the performance side , the way I had implemented converter strict validation was about 300% slower and more memory intensive than what's currently in master "no validation" (and Ignore also suffered the penalty because it was doing the unknown fields check and just not returning the error). Now, I'm seeing roughly parity between strict validation and the old "no validation" (actually, strict appears to be slightly faster and less memory (which is weird and I need to look into it on monday)).

One thing I did not implement was the "full path to unknown field" that we discussed offline. I noticed that the json deserializer also does not indicate the full path, so I wasn't sure if we actually wanted to do that here in SMP. I think it's easy enough to add (although I think it will require changing the method signatures to store the path down successive recursive calls of fromUnstructured), but wanted to hold off until we talk to Jordan and agree we actually want to preserve the full path in the errors.

Anyways, I know this is a lot of text, so let's sync on Monday. If you want, feel free to take another look at converter.go, otherwise I will address the rest of your feedback on Monday.

@liggitt
Copy link
Owner

liggitt commented Oct 22, 2021

kubernetes#105030 is merged if you want to transition this to k/k before the next round of feedback

if len(pruned) > 0 && v.persistStrictDecodingErrors {
allStrictErrs := make([]error, len(pruned))
i := 0
for unknownField, _ := range pruned {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating over a map makes the order non-deterministic, which is not ideal

also see suggestion above about making this low-level function just return the list of unknown field paths and letting the caller turn that into warnings/messages

@@ -1321,10 +1327,12 @@ func (v *unstructuredSchemaCoercer) apply(u *unstructured.Unstructured) error {
if err != nil {
return err
}

pruned := map[string]bool{}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also prefer var pruned map[string]... which avoids an unused map allocation

Comment on lines +1335 to 1337
pruned = structuralpruning.Prune(u.Object, v.structuralSchemas[gv.Version], false)
structuraldefaulting.PruneNonNullableNullsWithoutDefaults(u.Object, v.structuralSchemas[gv.Version])
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems correct... if we're preserving fields not mentioned in the schema, we're effectively saying all field names are permissible

structuralSchemas map[string]*structuralschema.Structural
structuralSchemaGK schema.GroupKind
preserveUnknownFields bool
persistStrictDecodingErrors bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"persist" reads strangely... maybe returnUnknownFieldPaths?

structuralSchemas map[string]*structuralschema.Structural
structuralSchemaGK schema.GroupKind
preserveUnknownFields bool
persistStrictDecodingErrors bool
}

func (v *unstructuredSchemaCoercer) apply(u *unstructured.Unstructured) error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this signature is private, would it be clearer to make the list of unknown fields be returned as a separate return value?

func (v *unstructuredSchemaCoercer) apply(u *unstructured.Unstructured) (unknownFieldPaths []string, err error)

That would let the caller determine how to use the list of unknown field paths, would preserve order, and would distinguish between validation errors and unknown fields

trace.Step("About to convert to expected version")
obj, gvk, err := decoder.Decode(body, &defaultGVK, original)

validationDirective, err := fieldValidation(req)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to do the validation in ValidateCreateOptions above (which presumably already extracted the desired directive from the req parameters), and use options.FieldValidation instead of a second local var? same question applies to all the places fieldValidation(req) is being used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I completely missed the Validate*Options. I've done that now

@@ -424,7 +438,7 @@ type applyPatcher struct {
userAgent string
}

func (p *applyPatcher) applyPatchToCurrentObject(obj runtime.Object) (runtime.Object, error) {
func (p *applyPatcher) applyPatchToCurrentObject(_ context.Context, obj runtime.Object) (runtime.Object, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a TODO? duplicate and unknown fields are detectable here, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not meant as a TODO, because SSA already defaults to strict validation on unknown fields. But I just realized that we had spoke about how SSA does not error on duplicate fields, so I’ve added a TODO to capture this.

if contentType != "" {
supported = false
for _, v := range strings.Split(contentType, ",") {
t, _, err := mime.ParseMediaType(v)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseMediaType has non-trivial cost and is already done in NegotiateInputSerializer for create/update, and is unused for patch (we require Content-Type to be exactly one of the supported patch types) ... I didn't expect to be doing this parsing a second time here

is there a reason we can't define a strict version of each deserializer and select that in create.go / update.go if FieldValidation is set to Warn or Strict?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this as part of the Validate*Options work

}
}

validationParam := req.URL.Query()["fieldValidation"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull this from create/update/patch options rather than re-extracting from the query

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case 0:
return runtime.IgnoreFieldValidation, nil
case 1:
switch strings.ToLower(validationParam[0]) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we'd want to be case-insensitive here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to be case-sensitive. Wasn’t sure what the right way was based on this comment kubernetes#104433 (comment)

@liggitt liggitt deleted the branch liggitt:json-stdlib October 23, 2021 13:43
@liggitt liggitt closed this Oct 23, 2021
@apelisse
Copy link

Has this been re-opened in k/k? I wanted to do another pass.

Copy link
Author

@kevindelgado kevindelgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed most of the feedback. Still a couple TODOs around test formatting and defaulting to warning, but wanted to get this up for another round of review ASAP.

Currently opening up a PR on k/k, will have it up in a few minutes.

Comment on lines +552 to +554
// - Ignore: ignore's unknown/duplicate fields
// - Strict: fail the request on unknown/duplicate fields
// - Warn: respond with a warning, but successfully serve the request.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reordered the values. Waiting on another round of feedback before defaulting to warn to confirm that is what we want to do

@@ -23,7 +23,8 @@ import (
// Prune removes object fields in obj which are not specified in s. It skips TypeMeta and ObjectMeta fields
// if XEmbeddedResource is set to true, or for the root if isResourceRoot=true, i.e. it does not
// prune unknown metadata fields.
func Prune(obj interface{}, s *structuralschema.Structural, isResourceRoot bool) {
// It returns the set of fields that it prunes.
func Prune(obj interface{}, s *structuralschema.Structural, isResourceRoot bool) map[string]bool {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -424,7 +438,7 @@ type applyPatcher struct {
userAgent string
}

func (p *applyPatcher) applyPatchToCurrentObject(obj runtime.Object) (runtime.Object, error) {
func (p *applyPatcher) applyPatchToCurrentObject(_ context.Context, obj runtime.Object) (runtime.Object, error) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not meant as a TODO, because SSA already defaults to strict validation on unknown fields. But I just realized that we had spoke about how SSA does not error on duplicate fields, so I’ve added a TODO to capture this.

return runtime.IgnoreFieldValidation, nil
}

// TODO: Should we blocklist unsupportedContentTypes (just protobuf) rather than allowlisting everything that isn't protobuf?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed as part of the Validation*Options work

if contentType != "" {
supported = false
for _, v := range strings.Split(contentType, ",") {
t, _, err := mime.ParseMediaType(v)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this as part of the Validate*Options work

}
}

validationParam := req.URL.Query()["fieldValidation"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case 0:
return runtime.IgnoreFieldValidation, nil
case 1:
switch strings.ToLower(validationParam[0]) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to be case-sensitive. Wasn’t sure what the right way was based on this comment kubernetes#104433 (comment)

liggitt pushed a commit that referenced this pull request Mar 10, 2023
 This is the commit message for patch #2 (refresh-temp):
liggitt pushed a commit that referenced this pull request Oct 31, 2023
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>

fix (#2)

Signed-off-by: Monis Khan <mok@microsoft.com>
liggitt pushed a commit that referenced this pull request Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants