-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Fix for the SDS update failure (#615)" as no longer needed on top of #559 #657
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Query]
I understand that this fix is not required as we are maintaining the separate list of subscribed resources. However, after removing the below change the race condition is still applicable and will result in removing the entry from the resource versionMap, which was already notified to the client ?.
Now during the next setSnapshot, when we are creating the delta response, the below implementation still relies on the GetResourceVersions() which is invalid(resource which was successfully notified to the client was overwritten) due to race condition. Since GetResourceVersions() is incorrect, we will unnecessarily be sending the resource to the client again even though the resource was already notified, and there was no change in the hash value.
I agree that notifying the resource again is not harmful, but might not be an efficient approach with large configuration. IMO we should address the race condition so that resources which are already notified remains intact in the resourceVersionMap.
@valerian-roche @haoruolei Please correct me if I am wrong in my findings. Thanks.
go-control-plane/pkg/cache/v3/delta.go
Lines 63 to 78 in b8aa9fb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi
Can you provide a test case for the aforementioned race condition?
It is also not clear to me where you expect this race to occur? Is it that a new request is handled prior to the first response, and its response might be created prior to the previous one being returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valerian-roche
Yes, the race condition was because the new request from client is handled prior to the first response, as a result the resourceVersionMap was not having the correct subscription list (as resourceVersionMap was being used to maintain the subscription list).
As you are maintaining the separate map for the subscription list, the above race condition will not overwrite or corrupt the subscription list, but the race condition still exist and will corrupt/overwrite the hash of the resource which is already notified to the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a test showing the issue?
From my understanding, you have:
I am not fully clear what you mean by "corrupted" here, as the version map should be good in the end. The xDS protocol clearly states that the control-plane can send updates of resources which have not changed or have not been explicitly subscribed to: in https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#how-the-client-specifies-what-resources-to-return:
The opposite case where we would not be sending an update for a resource that was removed then readded is a big part of why the logic is like this, as this will eventually converge to the latest version being sent for all requested resources. The change in #615 is potentially creating this case, as the version map no longer reflects what is the last objects returned (an object removed then re-added while still at the same version will not be sent). This is highly problematic in the case of delta-eds and cds updates that can create this specific setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that envoy protocol is eventual consistency. The data will not be corrupted with a separate subscription list and resourceversion map. The logic above always compares the resource in the cache with the resource map. So even if responses are not in the order of requests, in the end, the newest version will be recorded in the version map and sent to the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the current code ensures that the latest response will be run against the latest subscription set and with the version map latest set by a response. In a case where envoy would send very quickly subscribe to something, then remove, then re-subscribe, there is a chance that we would not send again the resources if all three are processed before a response is first sent. This can happen due to the queueing at https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L198 not guaranteeing how many responses can be pending prior to the select running, though realistically I would not expect this case to occur with more than one pending response in the general case (as the queueing time would have to be longer than the request processing time).
A known remaining issue is when a request reports a NACK. I have another PR to properly identify what has been returned and what has been acknowledged (which can be very different in some cases), but it is pending on the rework of the Cache interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the sub, remove, re-sub case above. I'm not sure I follow what could go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the VersionMap used to build a response is set when the last response was sent.
To be clear this will only happen if envoy sends both requests in a very short timeframe (~ms), compared to the current state after 615 when it will occur every time, which is fully breaking delta-eds
Overall I'm not really willing to change too much prior to #584 and #586 which are explicitly designed to fix this case and require far more logic update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. This seems a rare scenario but looking forward to the future fix work! BTW do you mean that setting version to "" when subscribing to a new resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@valerian-roche You are correct, with your new changes of separate subscription list, the version Map at the end will be correct. It is just that client sending requests in very short span would result in resending the resource, and as you said that it should be OK to resend. Infact resending the same object again was the behaviour with even #615 as well. I think we should be good to remove the changes done as part of #615.