-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweaks for libp2p message delivery rate improvements #1285
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This version will let us set peer peer outbound queue size to avoid problems with messages dropped when this queue is full during DKG. libp2p/go-libp2p-pubsub#230
The default value which is 32 may not be enough to send all messages from the given state of DKG. Setting higher queue size we expect to avoid 'dropping message to peer X: queue is full' failures.
We don't want to use more validate workers for synchronous validation (signature) than the number of CPUs. This can be only slower because of the context switching. Instead, we'll increase the size of validate queue (the next commit). Also, we are increasing outbound queue size from 64 to 128 in case two retransmission ticks are very close to another and we need to accommodate 2*64 messages in that queue.
In the forked version, we added WithValidationQueueSize function allowing us to increase the buffer for synchronous validation and avoid validator throttling.
Signature validation is done synchronously and it is a bottleneck for received messages. We increase the validate queue buffer from the default 32 to 128 to be able to accommodate messages from all group members even if two retransmissions were very close to each other.
We need this change: https://golang.org/doc/go1.13#crypto/ed25519 after a transitive to-openssl update.
lukasz-zimnoch
approved these changes
Jan 21, 2020
pdyraga
commented
Jan 21, 2020
@@ -1,4 +1,4 @@ | |||
FROM golang:1.12-alpine3.10 AS runtime | |||
FROM golang:1.13.6-alpine3.10 AS runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it require any tweaks in provisioning? @sthompson22
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Close: #1286
This PR contains changes allowing to improve message delivery rate for libp2p by tweaking some libp2p buffers and changing how do we read messages from the topic. There were three warnings from libp2p we were observing locally and on testnet. All of them are addressed here.
Previously, in the broadcast channel, we had just one goroutine reading from pubsub subscription and spawning goroutines handling messages. This one goroutine is not enough. We need multiple workers ready to take the message out of the queue immediately. For this reason, we now instantiate 32 workers reading from the pubsub subscription. It is unclear if we need to spawn sepearate goroutines for message handling now but this will be addressed separately.
We increased peer outbound queue size from 32 to 128. Each floodsub should publish 64 original messages per state (not counting retransmissions). Increasing the buffer to 128 messages will allow us to handle all original messages plus potential retransmissions if the retransmissions were sent immediately after the original messages.
Addressing this problem required a contribution to go-libp2p-pubsub: libp2p/go-libp2p-pubsub#255. Until it is accepted, we had to use
replace
directive in go modules to point to our fork allowing to callWithValidateQueueSize
and increase the size of validation queue from 32 to 128 elements.validateWorker()
reads fromvalidateQ
and invokesvalidate
function that performs validation of the message. Signature validation is performed synchronously. The number of validate workers defaults to the number of CPUs and can be updated withWithValidateWorkers
function. With no additional user validators, signature validation is the bottleneck when receiving new messages.Increasing the number of validating workers does not help given the context switching and bottleneck nature of this spot. As stated in
WithValidateWorkers
documentation, this function should be used rather to limit the number of workers to devote less CPU time for synchronous validation. On the other hand, with the default size ofvalidateQ
, and 64 messages (with no retransmissions) send for each phase, we experience throttled validation. Increasing the buffer to 128 messages will allow us to handle all original messages plus potential retransmissions if the retransmissions were sent immediately after the original messages and don't congest with too many workers wasting time on context switching.