Skip to content

Conversation

michaelasp
Copy link

  • One-line PR description: Adding new KEP for comparable resource version
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 28, 2025
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 28, 2025
@michaelasp
Copy link
Author

/cc @liggitt
/cc @jpbetz


## Alternatives

N/A
Copy link

@lmktfy lmktfy Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have alternatives, and I'm about to suggest one. There may be more.

We have a rich discovery API. I think we can use that, drawing inspiration from Rust and its concept of traits.

Rust traits are like interfaces, but can also be kind of zero sized. For example, the PartialEq trait from Rust's standard library means that you can use = to check for equality. The Eq trait (which requires also implementing PartialEq) makes a promise that all values can be compared against other values. Some nullable types only implement PartialEq. There are also traits to say if some, or all, values can be compared against each other: PartialOrd and Ord.

(Eq and Ord are just marker traits; they don't add any API surface, they are just making a stronger promise than their supertrait).

We can mark our APIs with something similar to traits and declare, per API, whether resourceVersion comparisons are meaningful for sorting
Doing it like this means we can have easy soft adoption; well, relatively easy.

Every in-tree API would implement resourceVersion ordering and can mark itself as such. A metrics adapter that doesn't implement it would not need to worry about clients making the wrong assumption.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add something like traits, I think we could mark our stable APIs as stable, too. It wouldn't be much extra code.

There might be some other traits-like things we can think up, such as marking if an API kind uses the .metadata.generation / .status.observedGeneration pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing that as an alternative is fine, but I think it's reasonable to proceed without needing opt-in comparability indicators in discovery.

There are three categories of APIs

1. Servers where resourceVersion is a comparable integer

  • built-in kube types served by kube-apiserver
  • built-in kube types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • CRD-backed types served by kube-apiserver
  • CRD-backed types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • extension API servers built with k8s.io/apiserver using normal CRUD storage
  • extension API servers built with other technologies but following kube-apiserver behavior

2. Servers where resourceVersion is not an integer

For servers like this, a local well-formedness check when comparing two resourceVersion strings triggering a comparison error (as proposed in the function signature) would be enough to guard against misinterpretation, without needing to augment discovery and make the client check discovery.

All the examples of unusual aggregated servers I'm aware of fall in this category, or have other server-side correctness issues that already mishandle resource versions in ways inconsistent with client-expected behavior:

3. Servers where resourceVersion is an integer, but is not comparable

Obviously this is possible, but do we have any actual known examples of real servers that do this?

I would really want to push extension API server authors to live in category 1 or 2. Either make your integer resourceVersions comparable like everyone expects (preferred), or make them clearly not integers if that's not possible.

Unless there's evidence that there are actually servers like this in use that won't adjust to be in category 1 or 2, I'd push back on the complexity of adding something like "integerResourceVersionIsComparable: false" to discovery per-type and making clients check it.


## Summary

Resource version is currently defined as an opaque string from the view of a client, with the only operation that is supported being equality comparisons. This differs from the internal apiserver implementation, where it is clearly defined as a monotonically incrementing integer. There are increasing requirements being required from clients consuming object metadata, where stronger comparisons than just equality are required.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The apiserver" is not the only implementation of .metadata, though: you can also integrate your control plane with an extension API server via the aggregation layer.


### Goals

The goals for this KEP are fairly straightforward, firstly we will expose a utility function that clients can use on the resource version to check comparisons between resource versions. This will take the opaque resource version string and return a boolean and an error if it occurs. Along with that we will update the documentation to specify that a ResourceVersion must be a monotonically increasing integer.
Copy link

@lmktfy lmktfy Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This risks implying that all APIs support this comparison. I'd prefer to make that promise on an API-by-API basis. See #5505 (comment) for the specifics of what I have in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that not-monotonic RV is ok for an API without Watch like Metrics API, however the advantage of having monotonic RV for informers is huge. Being able to do consistent reads, monitor staleness is deal breaker that will make non-compliant mostly API useless and force them to implement it.

I would also point out that it would force us to maintain two different informer implementations based on whether API has monotonic RV, because fully utilizing monotonic RV will motivate non trivial code changes.


## Alternatives

N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing that as an alternative is fine, but I think it's reasonable to proceed without needing opt-in comparability indicators in discovery.

There are three categories of APIs

1. Servers where resourceVersion is a comparable integer

  • built-in kube types served by kube-apiserver
  • built-in kube types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • CRD-backed types served by kube-apiserver
  • CRD-backed types served by something other than kube-apiserver (ensured via the conformance test proposed in this KEP)
  • extension API servers built with k8s.io/apiserver using normal CRUD storage
  • extension API servers built with other technologies but following kube-apiserver behavior

2. Servers where resourceVersion is not an integer

For servers like this, a local well-formedness check when comparing two resourceVersion strings triggering a comparison error (as proposed in the function signature) would be enough to guard against misinterpretation, without needing to augment discovery and make the client check discovery.

All the examples of unusual aggregated servers I'm aware of fall in this category, or have other server-side correctness issues that already mishandle resource versions in ways inconsistent with client-expected behavior:

3. Servers where resourceVersion is an integer, but is not comparable

Obviously this is possible, but do we have any actual known examples of real servers that do this?

I would really want to push extension API server authors to live in category 1 or 2. Either make your integer resourceVersions comparable like everyone expects (preferred), or make them clearly not integers if that's not possible.

Unless there's evidence that there are actually servers like this in use that won't adjust to be in category 1 or 2, I'd push back on the complexity of adding something like "integerResourceVersionIsComparable: false" to discovery per-type and making clients check it.

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did another sweep, this is looking good

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 3, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Sep 3, 2025
Copy link
Member

@stlaz stlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

The feature gate is not very well documented but as far as I understand, it was in place just to wait until ResourceVersions are comparable to make sure the invocation of cache.Controller{}.LastSyncResourceVersion() returns a sensible result.

cc @nilekhc

@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

Um.... not quite? (I didn't actually know about that gate). This KEP defines how client-side comparisons should happen, but those comparisons can return errors... I don't see any comparison or error-handling path in the spots where the InformerResourceVersion gate is used (or where the underlying LastSyncResourceVersion() is set or retrieved). InformerResourceVersion would have to be updated to use the defined comparison approach, and handle encountered errors properly, I think.

@liggitt
Copy link
Member

liggitt commented Sep 4, 2025

I'm pretty happy with the current state. I'd like @deads2k and @jpbetz to give it a pass.

/assign
/assign @deads2k @jpbetz

@michaelasp
Copy link
Author

Does merging this KEP effectively turn the InformerResourceVersion feature gate GA?

Um.... not quite? (I didn't actually know about that gate). This KEP defines how client-side comparisons should happen, but those comparisons can return errors... I don't see any comparison or error-handling path in the spots where the InformerResourceVersion gate is used (or where the underlying LastSyncResourceVersion() is set or retrieved). InformerResourceVersion would have to be updated to use the defined comparison approach, and handle encountered errors properly, I think.

I think these are two separate things, although I may be misunderstanding the feature gate. All I think that InformerResourceVersion does is return the last seen resource version of the informer, not much else. Since the dummy informer returns an empty string when this is called and we still need the last resource version the feature gate was added. I think it's unrelated to any comparison logic.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: michaelasp
Once this PR has been reviewed and has the lgtm label, please ask for approval from deads2k. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants