-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(pushsync): Don't account for failed retries due to accounting #1662
Conversation
Current changes include:
|
pkg/pushsync/pushsync.go
Outdated
} | ||
if err != nil { | ||
logger.Debugf("could not push to peer %s: %v", peer.String(), err) | ||
resultC <- &pushResult{err: err} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error does not seem to be read anywhere.
Consider using the errgroup package.
pkg/pushsync/pushsync.go
Outdated
Stamp: stamp, | ||
}); err != nil { | ||
_ = streamer.Reset() | ||
return nil, true, fmt.Errorf("chunk %s deliver to peer %s: %w", ch.Address().String(), peer.String(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the String()
call is not needed on %s arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @aloknerurkar, @metacertain, and @zelig)
pkg/pushsync/pushsync.go, line 354 at r2 (raw file):
Previously, zelig (Viktor Trón) wrote…
this error does not seem to be read anywhere.
Consider using the errgroup package.
errgroup is not needed since there is no concurrency here
pkg/pushsync/pushsync.go, line 365 at r2 (raw file):
select { case r := <-resultC: if r.receipt != nil {
check if r.err != nil
, log the error if not and continue
to the next retry
pkg/pushsync/pushsync.go, line 396 at r2 (raw file):
streamer, err := ps.streamer.NewStream(ctx, peer, nil, protocolName, protocolVersion, streamName) if err != nil { return nil, true, fmt.Errorf("new stream for peer %s: %w", peer.String(), err)
this should be nil,false, error
since there was no attempt yet (nothing has been sent)
pkg/pushsync/pushsync.go, line 440 at r2 (raw file):
Previously, zelig (Viktor Trón) wrote…
if you use errgroup or separate errs channel, you wont need this struct
errgroup won't help, since it will result in an error if one error or more has occurred. but since subsequent retries might resolve the problem, we would not like to error. i would personally refrain from adding an errgroup here. also, errgroup helps when havin concurrent requests, which is not the case here, since we wait for one request to expire before trying the next i.e. no parallel requests here
pkg/pushsync/pushsync.go, line 396 at r2 (raw file): Previously, acud (acud) wrote…
If NewStream returns an error, we are not able to connect to a peer right? I was assuming this might take some time to return in which case we can call it an attempt. No? In the subsequent change, we will filter peers which return this error. Which should mitigate the issue (by not considering these peers) in my view. If we don't consider this an attempt, we will not consider this node to be blocked for future requests and retry this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 5 unresolved discussions (waiting on @acud, @metacertain, and @zelig)
pkg/pushsync/pushsync.go, line 354 at r2 (raw file):
Previously, acud (acud) wrote…
errgroup is not needed since there is no concurrency here
Agree with @acud We only do 1 request at a time and only if it fails we retry.
pkg/pushsync/pushsync.go, line 365 at r2 (raw file):
Previously, acud (acud) wrote…
check if
r.err != nil
, log the error if not andcontinue
to the next retry
This error is returned from pushPeer. The error is already logged there once.
pkg/pushsync/pushsync.go, line 407 at r2 (raw file):
Previously, zelig (Viktor Trón) wrote…
the
String()
call is not needed on %s arguments
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 4 unresolved discussions (waiting on @acud, @aloknerurkar, @metacertain, and @zelig)
pkg/pushsync/pushsync.go, line 365 at r2 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
This error is returned from pushPeer. The error is already logged there once.
OK 👍 note here that the happy path is nested, usually you would check for an error and if there's an error do something, otherwise the happy path is assumed to be at the root of the method/block
pkg/pushsync/pushsync.go, line 396 at r2 (raw file):
If NewStream
returns an error it is most probably that we are no longer connected to this peer or that some other problem may exist. It should not take too long to return here.
If we don't consider this an attempt, we will not consider this node to be blocked for future requests and retry this.
As far as I remember it was agreed that the boolean returned from the method signifies whether there was an actual successful attempt to communicate the chunk. It should not be true
unless the chunk actually got sent successfully, which is not the case here. In any case, a failed attempt to send to the peer (i.e. if we did not manage to get around to sending the chunk to the peer) must always result in a failed attempt logged for the trace in the LRU cache and the peer added to the skip list
pkg/pushsync/pushsync.go, line 407 at r2 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
Done.
that actually doesn't work with chunk addresses in all cases. a manual test to confirm would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 2 files reviewed, 4 unresolved discussions (waiting on @acud, @aloknerurkar, @metacertain, and @zelig)
pkg/pushsync/pushsync.go, line 365 at r2 (raw file):
Previously, acud (acud) wrote…
OK 👍 note here that the happy path is nested, usually you would check for an error and if there's an error do something, otherwise the happy path is assumed to be at the root of the method/block
There will be an error check once the next PR will be merged.
The comment on the bottom was meant to clarify that we will be continuing the loop. We will revisit this once I merge the next PR and if more details are required will add.
pkg/pushsync/pushsync.go, line 407 at r2 (raw file):
Previously, acud (acud) wrote…
that actually doesn't work with chunk addresses in all cases. a manual test to confirm would be nice
DEBU[2021-05-11T13:25:40+05:30] could not push to peer 6000000000000000000000000000000000000000000000000000000000000000: new stream for peer 6000000000000000000000000000000000000000000000000000000000000000: peer not reachable
This does work in the test. Not sure what you mean by 'all' cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @metacertain and @zelig)
pkg/pushsync/pushsync.go, line 407 at r2 (raw file):
Previously, aloknerurkar (Alok Nerurkar) wrote…
DEBU[2021-05-11T13:25:40+05:30] could not push to peer 6000000000000000000000000000000000000000000000000000000000000000: new stream for peer 6000000000000000000000000000000000000000000000000000000000000000: peer not reachable
This does work in the test. Not sure what you mean by 'all' cases?
we had some formatting problems with using %s
and only swarm.Address
in the argument, I think this happens with the logger for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @metacertain and @zelig)
Just for documenting: We have decided to classify the NewStream error as an attempt for now. Idea is that it should not happen often and if it does, we can know using the new metrics. The metrics once run on production env, should tell us how the cache is being utilized. Once we have a better understanding, we can tweak these things in the future. Will check in as-is for now. |
#1638
This change is