Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic #292

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

edwarnicke
Copy link
Member

Signed-off-by: Ed Warnicke hagbard@gmail.com

Signed-off-by: Ed Warnicke <hagbard@gmail.com>
@@ -119,7 +120,6 @@ func main() {
grpcfd.WithChainStreamInterceptor(),
grpcfd.WithChainUnaryInterceptor(),
grpc.WithDefaultCallOptions(
grpc.WaitForReady(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edwarnicke
I think there will be a problem with this heal implementation - networkservicemesh/sdk#1107
Because, we have only one attempt to heal the connection. If this attempt fails - link - we don't start a new eventLoop and therefore no new attempt will occur .
Let's take a look at local nsmgr restart scenario. nsmgr will not be able to recreate instantly.
grpc.WaitForReady (or probably WithBlock) ensures that we wait for a new nsmgr to start and successfully complete the first healing attempt.
I think that's the reason why ci is red here...
Am I thinking right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glazychev-art Good eye, yes... I've been thinking on that too. I think the fix belongs in the heal implementation though. Even retaining the WithBlock and WaitForReady we'd need to fix the single try in heal.

The issue with WithBlock and WaitForReady is they are the moral equivalent of putting long sleeps in your code... its better to fail back to a definitive point that can act intelligently for the chain as soon as possible than to linger in the middle.

As to the CI, that could in fact be why its red. Interestingly, it doesn't fail for me locally... but I do believe there is something genuinely wrong we need to fix in heal that has been caught by the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants