This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 16
Treat resourceVersion as opaque string #467
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So part of the problem with this is that I believe we can actually corrupt our cache if we're simultaneously reconciling resources and only doing equality checks (i.e. an old resource is going to potentially override a newer resource). Since a lot of our internal state storage has changed fairly substantially, I'm not entirely sure if this is still an issue, but it clearly was early on -- which is why we explicitly made the decision early on to ignore the warning against resource version parsing (it has been a monotonically increasing global 64 bit integer based on etcd's
revision
field since day one IIRC).I'm fine if we can remove the reliance on that, but just be aware that we may get
Update
calls we're doingAll of this said, while this fix seems fairly straightforward, it can potentially have some pretty large consequences, so I would want to thoroughly vet it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewstucki if I understand this logic correctly, the caller of
K8sGatewayClasses.Upsert()
wants to mutate if and only if the cached version of the class they have is up-to-date, right? Instead of doing the resource version comparison manually here, why not pass the resource version through the client to the server as a precondition and allow the server to reject the call if it's out-of-date? Store ing.gatewayClasses[class.class.Name]
if successful and ignore the conflict error if you want your client not to re-try with an updated version from the cache.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting the server do the
resourceVersion
comparison for you would free you from having to carefully manage when you do and do not add to your client-side cache, meaning your option 2 & 3 would not occur. Although, one more question - are you using the controller-runtime cache-backed clients, and, if so, what is the additional utility to the in-memory cache in question here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concern is less about where we do the comparing and more that the meaning of "out of date" with this PR changes from "a lower incrementing resourceVersion integer than the one I currently have" to "a resource version that exactly matches the one I currently have".
The current state deals with a watch trigger series for versions 1->3->2 by rejecting the last version because
2 < 3
, ending up holding resourceVersion3
.The proposed state would accept the last version because
2 != 3
, ending up holding resourceVersion2
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I don't think I follow (yet!).
In the current implementation, the reconciler has a cached version of an object, with some
resourceVersion=a
. The intent with the currentUpsert()
semantic is that this reconciler's client call should only go through if the cache has not seen a more-recent versionresourceVersion=b
of the object yet.If, instead of doing the check in-memory, the reconclier unconditionally made the client call to the sever, and passed that
resourceVersion=a
in the precondition on theUpdateStatus()
, the server would ensure that the there had not been any committed changes to the object pasta
- so the invariant remains that theUpdateStatus()
succeeds if and only if the client's cached version (a
) is the latest. In fact, the invariant becomes stronger, since you are guaranteed not only thata
is newer than any revision you have happened to observe and store in your in-memory cache, but that it is the newest revision available in the database. Storing the object only on a successful call would allow your cache to stay coherent.With the current implementation, if you want to make sure your upsert is not clobbering newer data on the server (perhaps from other actors), you still need to pass the precondition to the server to guard against the following case:
resourceVersion=a
resourceVersion
comparison is successful, as no new data exists in the cacheresourceVersion=b
resourceVersion=b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I mentioned the
controller-runtime
cache-backed client, as well, is that you have a full view of the objects your reconclier is operating over in an in-memory cache as part of thecontroller-runtime
infrastructure. Therefore, a simple pattern to ensure correctness is to pass theresourceVersion
on objects you see in areconcile.Request
to the server as a precondition. If the server call fails with a conflict, the cached view that your reconclier used was out-of-date. However, you'll see the up-to-date version at some point in the future, at which point the reconciler's client call will succeed and you've made forward progress.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov The cache/out-of-k8s storage is required for shipping SDS and syncing all of our data to Consul -- having a working set of how things need to be and actually enforcing that when users change things externally (simple example of enforcing intentions updates when a user may add or remove an additional source or destination to a service that we're routing to). RE your comment in your issue about the in-memory store, the original design of this external storage layer was to be pluggable with future implementations, including using durable storage backends, in part to pave the way for working with both on k8s and off-k8s execution modes -- some of that is no longer true though.
RE the current use of
resourceVersion
I'm thinking that part of our use ofresourceVersion
could be ameliorated by actually doing the external storage sync after delegating to Kubernetes for the version check during our status updates, if the update succeeds, then as you mentioned, we have made forward progress since we were using a fresh entry and we can go ahead and do the rest of the sync. This does make tracking whether or not we were able to sync state to our external sources as part of the user reflectedStatus
more difficult though, as that now happens after the initial remote update and would require a second pass through the reconciler to actually update the remote sync status.Just to set expectations, given that there's likely a good amount of work involved with both changing this resourceVersion check and ensuring correctness with both Kubernetes usage and our external state syncing code and the current workload/priorities of our team, I'm not sure that we'll be able to change this behavior in short order. I would like to fix it, but it could take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely - if you have appetite for it, I am more than happy to take on the work. By no means did I mean to show up and ask you to do more - I am sure you are very busy! If that sounds like a good idea to you, I'd love to take more time to learn the code and perhaps sync with you to understand the problem space a little better. I'm sure there's a solution here that does not require breaking that particular API convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that this is tractable and thanks for your desire to contribute! If you do wind up tackling this, let me know if you have any questions or need some feedback.