-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retry user agent get from libp2p peerstore #2482
Conversation
0c6d519
to
642f07e
Compare
pkg/p2p/libp2p/libp2p.go
Outdated
) | ||
// Peerstore may not contain all keys and values right after the connections is created. | ||
// This retry mechanism ensures more reliable user agent propagation. | ||
for deadline := time.Now().Add(2 * time.Second); time.Now().Before(deadline); { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my only question would be: could it be that the retry mechanism would make some context with deadline in the enclosing context expire?
example:
func (k *Kad) connect(ctx context.Context, peer swarm.Address, ma ma.Multiaddr) error {
ctx, cancel := context.WithTimeout(ctx, peerConnectionAttemptTimeout)
defer cancel()
switch i, err := k.p2p.Connect(ctx, ma); {
kademlia dial has a 5 seconds timeout in this case. so the user agent printing might delay this and make it expire. I wonder if we could either decrease this timespan to be shorter so that it becomes insignificant comparing to the total connection context deadline or simply do this polling in a different goroutine and print it eventually in an unrelated logline that would have the overlay and user agent together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I've added context handling to the function.
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
v interface{} | ||
err error | ||
) | ||
// Peerstore may not contain all keys and values right after the connections is created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: *connection
This pr fixes the flaky test that checks for libp2p user agent in logs in high load environments like CI. It also ensures higher probability that the user agent is logged in high load environments.
The core problem is that the peerstore may not contain all keys and values right after the connections is created.
This retry mechanism ensures more reliable user agent propagation.
This change is