Connect: Make Connect health queries unblock correctly #5508
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #5506.
Addresses an issue found while investigating connect proxy service discovery. The issue is described in #5506 but can be summed up by: watching the target service index is not the right thing to do in most cases when requesting
/health/connect/:service
results.This fixes the issue as well as adding much more robust tests that explicitly caught failing cases of the old implementation.
This extends the optimization in #5449. In the Connect query case we don't always guarantee just a single chan to watch because the proxies returned might be named as multiple different service names. In addition since we don't know the set of all proxy service names that may be used, we need to always watch the actual index iterator for new instances being registered with a name we didn't watch already.
In the typical case though where all connect proxies are named consistently, connect queries still only require two watch chans. If there are both (consistently named) proxies and connect-native instances for a service then that increases to 3. If Connect proxies are added with random names then it may increase further but this seems an extremely unlikely case and would likely be unmanageable for other reasons too.