Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: Fix flaky test Test/ServerSideXDS_WithValidAndInvalidSecurityConfiguration #7411

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Jul 13, 2024

In the logs of failing runs for TestServerSideXDS_WithValidAndInvalidSecurityConfiguration, we see that the resource snapshot update request is sent to the xds management server before the xds client is able to connect to it. Every time a gRPC server receives the updated snapshot resource from the xds server, it sends a NACK as the configuration is invalid. When the server receives more than one NACK, it gets stuck while writing to a buffered channel here:

This results in the gRPC server timing out while getting the new configuration.

This change ensures that the server writes to the buffered channel at most once and doesn't get stuck.

Example failure: https://github.com/grpc/grpc-go/actions/runs/9790522733/job/27032314481?pr=7390

Tested

Verified that the test no longer fails in 100000 runs

Updates: #6914

RELEASE NOTES: None

@arjan-bal arjan-bal added this to the 1.66 Release milestone Jul 13, 2024
Copy link

codecov bot commented Jul 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.48%. Comparing base (bdd707e) to head (7360d9c).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7411      +/-   ##
==========================================
+ Coverage   81.42%   81.48%   +0.06%     
==========================================
  Files         348      350       +2     
  Lines       26744    26846     +102     
==========================================
+ Hits        21775    21875     +100     
- Misses       3779     3783       +4     
+ Partials     1190     1188       -2     

see 40 files with indirect coverage changes

@arjan-bal arjan-bal changed the title Prevent flakes in xDS server cert provider tests xds: Prevent flakes in server cert provider tests Jul 13, 2024
@arjan-bal arjan-bal requested a review from easwars July 15, 2024 07:50
@arjan-bal arjan-bal changed the title xds: Prevent flakes in server cert provider tests xds: Fix flaky Test/ServerSideXDS_WithValidAndInvalidSecurityConfiguration Jul 16, 2024
@arjan-bal arjan-bal changed the title xds: Fix flaky Test/ServerSideXDS_WithValidAndInvalidSecurityConfiguration xds: Fix flaky test Test/ServerSideXDS_WithValidAndInvalidSecurityConfiguration Jul 16, 2024
@arjan-bal arjan-bal force-pushed the deflake-server-cert-provider-tests branch from 7b23548 to f681bd7 Compare July 16, 2024 08:56
@arjan-bal arjan-bal force-pushed the deflake-server-cert-provider-tests branch from f681bd7 to 7360d9c Compare July 16, 2024 08:57
@arjan-bal arjan-bal linked an issue Jul 17, 2024 that may be closed by this pull request
@easwars
Copy link
Contributor

easwars commented Jul 23, 2024

It looks the ADS rpc handler on the go-control-plane is essentially processing one request (which includes ACK/NACK as well) at a time, serially. And it calls the OnStreamRequest callback as part of this serial processing and if the callback blocks, then the go-control-plane cannot handle any more requests. This seems ridiculous. I'm ok with fixing this on our end, but we should probably open an issue on the go-control-plane repository to discuss this. They are notoriously unresponsive though.

@easwars easwars assigned arjan-bal and unassigned easwars Jul 23, 2024
@arjan-bal arjan-bal merged commit ac5a7fe into grpc:master Jul 23, 2024
13 checks passed
@arjan-bal arjan-bal deleted the deflake-server-cert-provider-tests branch July 23, 2024 17:12
printchard pushed a commit to printchard/grpc-go that referenced this pull request Jul 30, 2024
printchard pushed a commit to printchard/grpc-go that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test package: test/xds
2 participants