Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qfix] Add refresh retries on failure #1081

Merged
merged 1 commit into from
Sep 20, 2021

Conversation

Bolodya1997
Copy link

Description

Add refresh retries on failure.

How Has This Been Tested?

  • Added unit testing to cover
  • Tested manually
  • Tested by integration testing
  • Have not tested

Types of changes

  • Bug fix
  • New functionality
  • Documentation
  • Refactoring
  • CI

Signed-off-by: Vladimir Popov <vladimir.popov@xored.com>
continue
}
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why retry instead of letting the next refresh catch it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... perhaps I'm simply confused... but this looks to me like it will call the refresh once, that will cancel the context (see line 72 above) and then it will return based on line 86 ... meaning... I don't see what this does for us at all. I can see some of the other fixes in the PR... but this just introduces complexity I don't think will ever be used...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait... I see.. the context isn't canceled if our call to the downstream gets an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... this is clever... it will tick at the roughly 1/3 marks even if we get an error... so not overwhelming... but keeps trying.

}
}(cancelCtx, afterCh)
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the variables to the go routine explicitly, rather than by closure when possible, it precludes a variety of potential issues and bugs.

@@ -74,17 +75,25 @@ func (t *refreshClient) Request(ctx context.Context, request *networkservice.Net
store(ctx, metadata.IsClient(t), cancel)

eventFactory := begin.FromContext(ctx)
timeClock := clock.FromContext(ctx)
clockTime := clock.FromContext(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like 'clockTime' better than 'timeClock'

@denis-tingaikin
Copy link
Member

@Bolodya1997 Could you also remind why is this critical to have retries for refresh?

continue
}
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait... I see.. the context isn't canceled if our call to the downstream gets an error.

continue
}
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... this is clever... it will tick at the roughly 1/3 marks even if we get an error... so not overwhelming... but keeps trying.

@edwarnicke edwarnicke merged commit c7b15ba into networkservicemesh:main Sep 20, 2021
nsmbot pushed a commit to networkservicemesh/cmd-nsc-init that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-registry-memory that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nsmgr that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/sdk-kernel that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/sdk-k8s that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-registry-proxy-dns that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nse-icmp-responder that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nsmgr-proxy that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
nsmbot pushed a commit to networkservicemesh/cmd-nse-vfio that referenced this pull request Sep 20, 2021
…k@main

PR link: networkservicemesh/sdk#1081

Commit: c7b15ba
Author: Vladimir Popov
Date: 2021-09-21 03:26:12 +0700
Message:
  - Add refresh retries on failure (#1081)
Signed-off-by: NSMBot <nsmbot@networkservicmesh.io>
@edwarnicke
Copy link
Member

@Bolodya1997 One question... and maybe I'm just tired, but how does the Refresh terminate?

Refresh is only for running in the real clients (not the passthroughs)... so the real clients never hit a 'timeout'... it feels like we need a termination clause in your loop that isn't just the cancel, something where we don't try past the point the client side runs out of runway on its own timeout... thoughts?

@Bolodya1997
Copy link
Author

@Bolodya1997 One question... and maybe I'm just tired, but how does the Refresh terminate?

Refresh is only for running in the real clients (not the passthroughs)... so the real clients never hit a 'timeout'... it feels like we need a termination clause in your loop that isn't just the cancel, something where we don't try past the point the client side runs out of runway on its own timeout... thoughts?

It is terminated only on passthrough clients with timeout. Probably we can add some timeout on client so it will close the chain if it fails to send a Request during some time?

@edwarnicke
Copy link
Member

edwarnicke commented Sep 27, 2021

Passthrough clients don't originate refreshed :) Only 'leaf' clients originate refresh :)

@Bolodya1997
Copy link
Author

@edwarnicke
Please take a look at this issue - #1092.
I suppose it should fix the infinite refresh loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants