Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discv5: only store crawl error if no query succeeded #55

Merged
merged 2 commits into from
Feb 1, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions discv5/crawler.go
Original file line number Diff line number Diff line change
@@ -13,7 +13,7 @@ import (
"github.com/ethereum/go-ethereum/p2p/enode"
"github.com/libp2p/go-libp2p/core/network"
"github.com/libp2p/go-libp2p/core/peer"
"github.com/libp2p/go-libp2p/p2p/host/basic"
basichost "github.com/libp2p/go-libp2p/p2p/host/basic"
ma "github.com/multiformats/go-multiaddr"
log "github.com/sirupsen/logrus"
"go.uber.org/atomic"
@@ -25,6 +25,8 @@ import (
"github.com/dennis-tra/nebula-crawler/discvx"
)

const MAX_CRAWL_RETRY_AFTER_TIMEOUT = 2 // magic

type CrawlerConfig struct {
DialTimeout time.Duration
AddrDialType config.AddrType
@@ -460,18 +462,19 @@ func (c *Crawler) crawlDiscV5(ctx context.Context, pi PeerInfo) chan DiscV5Resul
result.RespondedAt = &now
}

success := false
// loop through the buckets sequentially because discv5 is also doing that
// internally, so we won't gain much by spawning multiple parallel go
// routines here. Stop the process as soon as we have received a timeout and
// don't let the following calls time out as well.
for i := 0; i <= discvx.NBuckets; i++ { // 15 is maximum
for i := 0; i <= discvx.NBuckets; i++ { // 17 is maximum
var neighbors []*enode.Node
neighbors, err = c.listener.FindNode(pi.Node, []uint{uint(discvx.HashBits - i)})
if err != nil {

if errors.Is(err, discvx.ErrTimeout) {
timeouts += 1
if timeouts < 2 {
if timeouts < MAX_CRAWL_RETRY_AFTER_TIMEOUT {
continue
}
}
@@ -480,6 +483,8 @@ func (c *Crawler) crawlDiscV5(ctx context.Context, pi PeerInfo) chan DiscV5Resul
err = fmt.Errorf("getting closest peer with CPL %d: %w", i, err)
break
}
success = true
Copy link
Owner

@dennis-tra dennis-tra Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, could you use the errorBits to determine if there is at least one successful FIND_NODE RPC? Then you wouldn't need the success variable.

However, errorBits aren't tracked here yet (just saw that), so this also needs fixing. 🙈 Could you also do that? The drainBuckets in crawler_p2p.go in the libp2p package would have the logic of how to do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated:

nebula/discv5/crawler.go

Lines 516 to 521 in 9a1a685

// if we have at least a successful result, delete error
// bitwise operation checks whether errorBits is a power of 2 minus 1,
// if not, then there was at least one successful result
if result.Error != nil && (result.RoutingTable.ErrorBits&(result.RoutingTable.ErrorBits+1)) == 0 {
result.Error = nil
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, I meant errorBits isn't even used in this routine. So it will always be 0 as there's no statement that updates the errorBits variable. In case of an error the corresponding bit should be set (even though it's inaccurate because we might exit the loop early)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errorBits is updated at

errorBits.Add(1 << i)

You mean, if we exit the loop early we need to set the error bits to 1 for the buckets that were skipped?

I'll update the ErrorBits to uint32 as it can contain up to 17 bits.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I was ctrl+f searching the page for errorBits and was afterward expanding the code lines in the GitHub diff viewer. However, this didn't highlight the errorBits variable there so I was under the impression that it wasn't used. Sorry for the noise!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the ErrorBits to uint32 as it can contain up to 17 bits.

ErrorBits is defined in core and I don't want to mess everything up, so we can assume that bucket 17 (or 239 in discv5 naming) will always be empty and never cause issues 👍🏻

timeouts = 0

if result.RespondedAt == nil {
now := time.Now()
@@ -510,6 +515,11 @@ func (c *Crawler) crawlDiscV5(ctx context.Context, pi PeerInfo) chan DiscV5Resul
result.RoutingTable.Neighbors = append(result.RoutingTable.Neighbors, n)
}

// if we have at least a successful result, delete error
if success && result.Error != nil {
result.Error = nil
}

// if there was a connection error, parse it to a known one
if result.Error != nil {
result.ErrorStr = db.NetError(result.Error)