Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry user agent get from libp2p peerstore #2482

Merged
merged 6 commits into from
Sep 8, 2021

Conversation

janos
Copy link
Member

@janos janos commented Sep 7, 2021

This pr fixes the flaky test that checks for libp2p user agent in logs in high load environments like CI. It also ensures higher probability that the user agent is logged in high load environments.

The core problem is that the peerstore may not contain all keys and values right after the connections is created.
This retry mechanism ensures more reliable user agent propagation.

This change is Reviewable

@janos janos added the in progress ongoing development , hold out with review label Sep 7, 2021
@janos janos self-assigned this Sep 7, 2021
@janos janos changed the title chore: return errors by libp2p peer user agent function fix: retry user agent get from libp2p peerstore Sep 8, 2021
@janos janos requested a review from acud September 8, 2021 12:31
@janos janos added ready for review The PR is ready to be reviewed and removed in progress ongoing development , hold out with review labels Sep 8, 2021
@acud acud force-pushed the libp2p-useragent-flakytest branch from 0c6d519 to 642f07e Compare September 8, 2021 14:00
)
// Peerstore may not contain all keys and values right after the connections is created.
// This retry mechanism ensures more reliable user agent propagation.
for deadline := time.Now().Add(2 * time.Second); time.Now().Before(deadline); {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only question would be: could it be that the retry mechanism would make some context with deadline in the enclosing context expire?
example:

func (k *Kad) connect(ctx context.Context, peer swarm.Address, ma ma.Multiaddr) error {
	ctx, cancel := context.WithTimeout(ctx, peerConnectionAttemptTimeout)
	defer cancel()

	switch i, err := k.p2p.Connect(ctx, ma); {

kademlia dial has a 5 seconds timeout in this case. so the user agent printing might delay this and make it expire. I wonder if we could either decrease this timespan to be shorter so that it becomes insignificant comparing to the total connection context deadline or simply do this polling in a different goroutine and print it eventually in an unrelated logline that would have the overlay and user agent together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I've added context handling to the function.

@janos janos marked this pull request as ready for review September 8, 2021 15:39
@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 8, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@janos janos requested a review from notanatol September 8, 2021 16:30
v interface{}
err error
)
// Peerstore may not contain all keys and values right after the connections is created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: *connection

@janos janos merged commit a9a0fe3 into master Sep 8, 2021
@janos janos deleted the libp2p-useragent-flakytest branch September 8, 2021 17:31
@acud acud added this to the v1.2.0 milestone Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants