-
Notifications
You must be signed in to change notification settings - Fork 4.6k
xdsdepmgr: Avoid using actual DNS lookups and fix race during test failure #8805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8805 +/- ##
==========================================
+ Coverage 83.30% 83.40% +0.10%
==========================================
Files 418 417 -1
Lines 32897 32978 +81
==========================================
+ Hits 27404 27505 +101
+ Misses 4093 4078 -15
+ Partials 1400 1395 -5 🚀 New features to boost your workflow:
|
|
Were you able to reproduce this on your setup (either on a debian linux workstation or on forge) and ensure that the fix actually solves the problem? |
Yes, I was able to reproduce the hang by making the test fail on purpose , and have verified that it is fixed with this change. |
arjan-bal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some nits.
| t.Fatalf("received unexpected error from dependency manager: %v", err) | ||
| } | ||
| dnsR := replaceDNSResolver(t) | ||
| dnsR.InitialState(resolver.State{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can call UpdateState now; after last year's changes, it correctly buffers updates if the resolver hasn't been built yet. The InitialState method is now redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| case err := <-watcher.errorCh: | ||
| t.Fatalf("received unexpected error from dependency manager: %v", err) | ||
| } | ||
| dnsR := replaceDNSResolver(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move the DNS resolver replacement near the top of the test, before any xDS configuration is sent? This would avoid potential races where the real DNS resolver gets used before replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| { | ||
| Addresses: []resolver.Address{ | ||
| {Addr: "127.0.0.1:8081"}, | ||
| }, | ||
| }, | ||
| { | ||
| Addresses: []resolver.Address{ | ||
| {Addr: "[::1]:8081"}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please revert the formatting changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Fixes: #8801
There were 2 problems here :
The actual DNS lookup calls return different resolved addresses for "localhost" on different machines. Changed the code to replace the DNS resolver with a manual resolver in test to mock the DNS resolver.
In the case where the tests send a EDS or cluster error , the xds management server keeps sending resource error continuously which in turn triggers updates from dependency manager which pushes the update to a channel. To mitigate this situation we had a done channel , that we close when the test ends and the update function check for the done channel , if it is close, it returns. In TestAggregateCLusterChildError test , (and other tests too) , the channel was being closed at the end of the test , which meant it will close only when the test passes. And if the tests fails, the done channel was not getting closed , which made the update function to block.
Changed the close(done channel) to be a defer function , but this should be the first thing that happens after the test is done , before the dependency manager closes, so this is the last defer called in the test.
RELEASE NOTES: None