Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substraction overflow when polling Swarm with custom behaviour #1290

Closed
lamafab opened this issue Oct 28, 2019 · 2 comments · Fixed by #1291
Closed

Substraction overflow when polling Swarm with custom behaviour #1290

lamafab opened this issue Oct 28, 2019 · 2 comments · Fixed by #1291
Assignees
Labels

Comments

@lamafab
Copy link

lamafab commented Oct 28, 2019

In our program we have a Swarm which uses a custom NetworkBehaviour. After executing that program, within 5 seconds tokio panics.

Result

thread 'tokio-runtime-worker-2' panicked at 'attempt to subtract with overflow', /home/anon/.cargo/git/checkouts/rust-libp2p-98135dbcf5b63918/732becd/protocols/kad/src/behaviour.rs:1316:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

How to reproduce
I tried to narrow down the issue, but I cannot give you an absolute certain cause of that panic. For me, it seems to occur in the following case (both conditions apply):

  • The custom behavior implements Kademlia, mDNS and Ping.
  • tokio::run() repeatedly calls kademlia.bootstrap().

Workaround
For our program, we can ensure that kademlia.bootstrap() gets called exactly once, after at least one node has been added to the DHT.

from:

swarm.kademlia.bootstrap()

to:

if !bootstrap_started && swarm.kademlia.kbuckets_entries().count() > 0 {
    swarm.kademlia.bootstrap();
    bootstrap_started = true;
}

Alternatively, in the rust-libp2p file protocols/kad/src/behaviour.rs:1316 the following type can be changed from usize to isize:

// Calculate the available capacity for queries triggered by background jobs.
let mut jobs_query_capacity = JOBS_MAX_QUERIES - self.queries.size();

This seems to solve the issue, but this might some kind of bad hack.

Our full code:
https://hastebin.com/udofedajor.php

@romanb
Copy link
Contributor

romanb commented Oct 28, 2019

Thanks for reporting the issue. This is fixed in #1291. Note though that calling bootstrap inside the poll_fn as you're doing in the linked code snippet does not make much sense and will lead to an ever growing number of concurrent bootstrap queries, because a new query is started whenever the swarm is NotReady, i.e. whenever the swarm is waiting for I/O, timers or some other wakeup, which happens a lot in an application. The initialisation code for Kademlia should contain one or more calls to Kademlia::add_address to let Kademlia know about at least one other node (unless it is the very first node of the DHT being started, of course), followed by Kademlia::bootstrap once. Calling Kademlia methods to start queries does not actually perform any I/O at that point, the queries are only "run" once you start polling the swarm, so you can start any query that is supposed to be run once at startup outside of the loop that polls the swarm.

@lamafab
Copy link
Author

lamafab commented Oct 28, 2019

@romanb , thanks for the info.

Note though that calling bootstrap inside the poll_fn as you're doing in the linked code snippet does not make much sense...

Yeah, I've made sure that .bootstrap() gets called exactly once:

    let mut listening = false;
    let mut bootstrap_started = false;
    tokio::run(futures::future::poll_fn(move || -> Result<_, ()> {
        loop {
            match swarm.poll().unwrap() {
                Async::Ready(_) => {},
                Async::NotReady => {
                    if !listening {
                        if let Some(a) = Swarm::listeners(&swarm).next() {
                            info!("Listening locally on {:?}", a);
                            listening = true;
                        }
                    }
                    break
                }
            }
        }

        if !bootstrap_started && swarm.kademlia.kbuckets_entries().count() > 0 {
            swarm.kademlia.bootstrap();
            bootstrap_started = true;
        }

        trace!("Amount of Kademlia entries: {}", swarm.kademlia.kbuckets_entries().count());

        Ok(Async::NotReady)
    }));

But I agree that it should be called outside of poll_fn(), I will adjust that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants