-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swarm/network: measure time of messages in priority queue #19250
Conversation
swarm/network/stream/peer.go
Outdated
@@ -154,7 +154,7 @@ func (p *Peer) Deliver(ctx context.Context, chunk storage.Chunk, priority uint8, | |||
} | |||
|
|||
ctx = context.WithValue(ctx, "stream_send_tag", nil) | |||
return p.SendPriority(ctx, msg, priority) | |||
return p.Send(ctx, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change it here?
either leave it or change everywhere by temporarily redefining sendPriority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change it so that we can use tracing to debug this problem. SendPriority
doesn't support traces.
Furthermore I posted measurements of the priority queue in the chat yesterday, which show that messages could stay in the priority queue for tens-of-seconds, which is really bad:
Bottom line, I try to reduce the number of packages that behave weird in order to understand the timeout issue.
Smoke tests have been passing for the past 16 hours consistently with this PR, which means that using Send
here is just fine at least according to our current test suite.
@zelig do we have any actual test that show the we must use a priority queue for send RPC calls? Generally all our RPC handlers are very fast and the handleIncoming
loop is really tight, so the priority queue doesn't make much sense to me.
Note that when the messages stayed for that long in the queue, there were no error messages logged that suggest queue contention
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job @nonsense could you try setting the syncer streamer priority from top to mid to understand what was happening
65adf5a
to
3b1b81b
Compare
807bd39
to
d972357
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor comment
err := trackChunks(randomBytes[:]) | ||
if err != nil { | ||
log.Error(err.Error()) | ||
// trigger debug functionality on randomBytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to trigger this even if err
is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so that we compare success runs vs failed runs.
This PR is:
inputSeed
flag to the smoke tests, so that we can reproduce an exact upload to a given Swarm deployment.trackChunks
both on successful and on failed smoke test runs, so that we can compare between them.Debug
andTrace
logs in some packages.f()
(generally ap2p.Send()
call) in our priority queue.stream
API private.