Skip to content

Conversation

@mjbonifacio
Copy link
Owner

@mjbonifacio mjbonifacio commented Oct 9, 2025

SchemaValidationFailure Struct Refactoring

tl;dr I wanted to get full clarity on how things will change before making all the changes to re-map fields, locations, values, etc.

This PR clarifies the purpose of each struct field on SchemaValidationFailure, and get buy in on that before making a huge batch of changes.

Summary

Refactored SchemaValidationFailure struct to align with JSON Schema specification terminology, improve clarity, and eliminate ambiguity in field naming.

Changes Made

1. Renamed Fields for JSON Schema Alignment

DeepLocationKeywordLocation

  • Before: DeepLocation string - Ambiguous name, unclear what "deep" referred to
  • After: KeywordLocation string - Matches JSON Schema spec terminology
  • Rationale:
    • KeywordLocation is the official JSON Schema term from the jsonschema/v6 library
    • Refers to the path in the schema to the violated keyword (e.g., /properties/email/pattern)
    • Makes it clear this is about the schema rule, not the data
    • Consistent with er.KeywordLocation from jsonschema.OutputUnit

AbsoluteLocationAbsoluteKeywordLocation

  • Before: AbsoluteLocation string - Lost the "Keyword" context
  • After: AbsoluteKeywordLocation string - Full, unambiguous name
  • Rationale:
    • Matches JSON Schema spec terminology exactly
    • Pairs naturally with KeywordLocation (relative vs absolute)
    • Consistent with er.AbsoluteKeywordLocation from jsonschema.OutputUnit
    • Makes it clear this is also a schema location, just with absolute URI

2. Field Organization Improvements

Moved to Better Logical Grouping

// Data instance location (where in the data being validated):
InstancePath []string    // Raw path segments: ["user", "email"]
FieldName string         // Last segment: "email"  
FieldPath string         // JSONPath format: "$.user.email"

// Schema location (where in the schema that failed):
KeywordLocation string          // Relative: "/properties/email/pattern"
AbsoluteKeywordLocation string  // Absolute: "https://..."

This separation makes it crystal clear:

  • InstancePath, FieldPath, FieldNameWhere in the DATA
  • KeywordLocation, AbsoluteKeywordLocationWhere in the SCHEMA

3. Removed Unused Field

ReferenceExample - DELETED

  • Rationale:
    • Grep search found zero usages across entire codebase
    • Field was never populated or consumed
    • Removing reduces struct size and complexity

4. Improved Documentation

ReferenceSchema Comment

  • Before: "The schema that was referenced in the validation failure"
  • After: "The schema that was referenced in the validation failure"
  • Still clear, accurate description

ReferenceObject Comment

  • Before: "The object that was referenced in the validation failure"
  • After: "The object that failed schema validation"
  • Rationale: More specific - this is the actual failed data, not just "referenced"

5. Deprecated Ambiguous Field

Location - DEPRECATED

  • Status: Moved to bottom of struct, marked as deprecated
  • Deprecation Comment: // DEPRECATED in favor of explicit use of FieldPath & InstancePath
  • Rationale:
    • Was inconsistently set throughout codebase:
      • Sometimes er.InstanceLocation (data location)
      • Sometimes er.KeywordLocation (schema location)
      • Sometimes static strings like "unavailable", "schema compilation", "/required"
    • This ambiguity made it unreliable for consumers
    • Now have explicit fields: use FieldPath/InstancePath for data, KeywordLocation for schema
  • Kept for backward compatibility: Existing consumers can still access it, but are warned to migrate

6. Fixed Comment Typo

Line 99 in ValidationError struct

  • Before: "This is only populated whe the validation type is against a schema"
  • After: "This is only populated when the validation type is against a schema"

Verification

Compilation Check

  • ✅ All usages of DeepLocationKeywordLocation updated
  • ✅ All usages of AbsoluteLocationAbsoluteKeywordLocation updated
  • ✅ No remaining references to old field names
  • ✅ All tests pass

Updated Files

  1. errors/validation_error.go - Struct definitions
  2. schema_validation/validate_schema.go - Sets KeywordLocation and AbsoluteKeywordLocation
  3. schema_validation/validate_document.go - Sets KeywordLocation and AbsoluteKeywordLocation

Backward Compatibility

  • Location field still exists (deprecated but functional)
  • ✅ JSON/YAML tags unchanged for non-renamed fields
  • ✅ New field names have appropriate JSON tags (keywordLocation, absoluteKeywordLocation)

Key Insights Discovered

HTTP Component Context Already Exists

During this refactoring, we discovered that ValidationError already provides clear HTTP component context via:

  • ValidationType: "parameter", "requestBody", "response"
  • ValidationSubType: "path", "query", "header", "cookie", "schema"

This means consumers can easily determine if an error is from:

  • Path parameters: ValidationType == "parameter" && ValidationSubType == "path"
  • Query parameters: ValidationType == "parameter" && ValidationSubType == "query"
  • Headers: ValidationType == "parameter" && ValidationSubType == "header"
  • Cookies: ValidationType == "parameter" && ValidationSubType == "cookie"
  • Request body: ValidationType == "requestBody"
  • Response body: ValidationType == "response"

The SchemaValidationFailure.Location field was never meant to indicate HTTP component - it was for within-field location.

Future Work (Separate Commit)

1. Add Tests for KeywordLocation Invariant

After the schema refactoring is complete, add tests to validate the following invariant:

  • When OriginalJsonSchemaError is nil, both KeywordLocation and AbsoluteKeywordLocation should be empty strings
  • When OriginalJsonSchemaError is not nil, both fields may be populated (when the error originates from JSON Schema validation)

Rationale: This documents the expected behavior that keyword location fields are only relevant when the validation failure originated from JSON Schema validation, not from other types of validation (e.g., parameter encoding errors, schema compilation errors).

Test scenarios:

func TestSchemaValidationFailure_KeywordLocations_WhenNotFromJsonSchema(t *testing.T) {
    // When OriginalJsonSchemaError is nil, KeywordLocation fields should be empty
}

func TestSchemaValidationFailure_KeywordLocations_WhenFromJsonSchema(t *testing.T) {
    // When OriginalJsonSchemaError is set, KeywordLocation fields may be populated
}

2. Remove Deprecated Location Field

Completely remove the deprecated Location field from SchemaValidationFailure in favor of its more specific counterparts:

  • Use FieldName for the specific field that failed
  • Use FieldPath for the JSONPath representation

Current state:

// DEPRECATED in favor of explicit use of FieldPath & InstancePath
Location string `json:"location,omitempty" yaml:"location,omitempty"`

Actions required:

  1. Remove the Location field from the struct
  2. Update all code that sets Location to use FieldPath or FieldName instead
  3. Update the Error() method to use non-deprecated fields

3. Update SchemaValidationFailure.Error() Method

The Error() method currently uses the deprecated Location field:

// Current (uses deprecated field):
func (s *SchemaValidationFailure) Error() string {
    return fmt.Sprintf("Reason: %s, Location: %s", s.Reason, s.Location)
}

Should be updated to use non-deprecated fields:

// Proposed:
func (s *SchemaValidationFailure) Error() string {
    if s.FieldPath != "" && s.KeywordLocation != "" {
        return fmt.Sprintf("Reason: %s, Field: %s, Keyword: %s", 
            s.Reason, s.FieldPath, s.KeywordLocation)
    }
    if s.FieldPath != "" {
        return fmt.Sprintf("Reason: %s, Field: %s", s.Reason, s.FieldPath)
    }
    if s.KeywordLocation != "" {
        return fmt.Sprintf("Reason: %s, Keyword: %s", s.Reason, s.KeywordLocation)
    }
    return fmt.Sprintf("Reason: %s", s.Reason)
}

Note: This change should be done in conjunction with removing the Location field to ensure all error messages remain informative.

Benefits

  1. Alignment with Standards: Now uses official JSON Schema specification terminology
  2. Clarity: Clear separation between data locations and schema locations
  3. Consistency: Field names match the source (jsonschema.OutputUnit)
  4. Reduced Ambiguity: Deprecated the confusing Location field
  5. Better Documentation: Improved comments make purpose clear
  6. Cleaner Code: Removed unused ReferenceExample field

Breaking Changes

⚠️ This is a breaking change for consumers who use:

  • DeepLocation field (now KeywordLocation)
  • AbsoluteLocation field (now AbsoluteKeywordLocation)

However, this is justified because:

  1. These names were misleading and caused confusion
  2. New names align with industry standard terminology
  3. The change makes the API more intuitive and self-documenting
  4. Consumers should be migrating away from Location anyway (now deprecated)

Related Context

Ongoing Work: Path Parameter Validation Errors

This refactoring is part of a larger effort to ensure all validation errors include comprehensive SchemaValidationErrors. We're systematically reviewing all error paths in ValidatePathParamsWithPathItem to populate schema validation details consistently.

// DeepLocation is the path to the validation failure as exposed by the jsonschema library.
DeepLocation string `json:"deepLocation,omitempty" yaml:"deepLocation,omitempty"`
// KeywordLocation is the relative path to the JsonSchema keyword that failed validation
KeywordLocation string `json:"keywordLocation,omitempty" yaml:"keywordLocation,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just doing SchemaLocation and AbsoluteSchemaLocation? Although that may be somewhat confusing if someone reads it as "where was my schema located" rather than "where did it fail in my schema"?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that they match up to this: https://json-schema.org/understanding-json-schema/keywords

I stole it directly from the jsonschema error

ReferenceObject string `json:"referenceObject,omitempty" yaml:"referenceObject,omitempty"`

// ReferenceExample is an example object generated from the schema that was referenced in the validation failure.
ReferenceExample string `json:"referenceExample,omitempty" yaml:"referenceExample,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this example was removed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- this field isn't written to anywhere in the library, so it would only be read/written to by users of the library. IMO should be deleted since it's effectively a custom user field and just adds confusion

OriginalError *jsonschema.ValidationError `json:"-" yaml:"-"`
// DEPRECATED in favor of explicit use of FieldPath & InstancePath
// Location is the XPath-like location of the validation failure
Location string `json:"location,omitempty" yaml:"location,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if we're already making a breaking change, we should just delete this rather than calling it deprecated.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like

// Location is the XPath-like location of the validation failure
Location string `json:"location,omitempty" yaml:"location,omitempty"`
// InstancePath is the raw path segments from the root to the failing field
InstancePath []string `json:"instancePath,omitempty" yaml:"instancePath,omitempty"`
Copy link

@its-hammer-time its-hammer-time Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include the actual field itself?

Example:

{
  "InstancePath": [ "user" ],
  "FieldName": "email",
  "FieldPath": "$.user.email"
}

Additionally, I think it's a bit confusing when someone should use InstancePath vs FieldPath. What do you think about consolidating this a bit to something like

type SchemaValidationFailure {
  InstancePath[] string // Inclusive of the field name
}

func (svf *SchemaValidationFailure) GetJSONPathToField() string {
  // Build the JSONPath string from the InstancePath slice
  return ....
}

func (svf *SchemaValidationFailure) GetField() string {
  // Have some additional length checks...
  return svf.InstancePath[:-1]
}


failure := &SchemaValidationFailure{
  InstancePath: [ "user", "email" ]
}

fmt.Println(failure.GetJSONPathToField()) // $.user.email
fmt.Println(failure.GetField()) // email

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't change anything here, just moved this struct field up to group stuff that's on the incoming payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants