Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio-runtime-worker panicked #2099

Closed
deni64k opened this issue Dec 17, 2020 · 5 comments
Closed

tokio-runtime-worker panicked #2099

deni64k opened this issue Dec 17, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@deni64k
Copy link

deni64k commented Dec 17, 2020

Description

Hello,

We see this error after which Lighthouse paralyzes. No backtrace, unfortunately.

lighthouse[224353]: Dec 16 23:22:43.234 INFO New block received                      hash: 0xd161…16bc, slot: 205013
lighthouse[224353]: Dec 16 23:22:49.000 INFO Synced                                  slot: 205013, block: 0xd161…16bc, epoch: 6406, finalized_epoch: 6404, finalized_root: 0x6fa5…4666, peers: 513, service: slot_notifier
lighthouse[224353]: Dec 16 23:22:56.086 INFO New block received                      hash: 0x7d77…413c, slot: 205014
lighthouse[224353]: Dec 16 23:23:01.005 INFO Synced                                  slot: 205014, block: 0x7d77…413c, epoch: 6406, finalized_epoch: 6404, finalized_root: 0x6fa5…4666, peers: 515, service: slot_notifier
lighthouse[224353]: Dec 16 23:23:07.359 INFO New block received                      hash: 0x7cde…2c10, slot: 205015
lighthouse[224353]: thread 'tokio-runtime-worker' panicked at 'elapsed=37085312; when=37085312', /opt/rust/registry/src/github.com-1ecc6299db9ec823/tokio-util-0.5.1/src/time/wheel/mod.rs:245:5
lighthouse[224353]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
lighthouse[224353]: Dec 16 23:23:13.000 INFO Synced                                  slot: 205015, block: 0x7cde…2c10, epoch: 6406, finalized_epoch: 6404, finalized_root: 0x6fa5…4666, peers: 514, service: slot_notifier
lighthouse[224353]: Dec 16 23:23:15.009 WARN Error processing HTTP API request       method: POST, path: /eth/v1/validator/aggregate_and_proofs, status: 500 Internal Server Error, elapsed: 6.123981ms
lighthouse[224353]: Dec 16 23:23:15.010 WARN Error processing HTTP API request       method: POST, path: /eth/v1/validator/aggregate_and_proofs, status: 500 Internal Server Error, elapsed: 7.22892ms
lighthouse[224353]: Dec 16 23:23:23.036 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/pool/attestations, status: 500 Internal Server Error, elapsed: 1.734088ms
lighthouse[224353]: Dec 16 23:23:23.047 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/pool/attestations, status: 500 Internal Server Error, elapsed: 1.697393ms
lighthouse[224353]: Dec 16 23:23:23.056 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/pool/attestations, status: 500 Internal Server Error, elapsed: 1.653093ms
lighthouse[224353]: Dec 16 23:23:23.080 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/pool/attestations, status: 500 Internal Server Error, elapsed: 2.599251ms
lighthouse[224353]: Dec 16 23:23:23.100 WARN Error processing HTTP API request       method: POST, path: /eth/v1/beacon/pool/attestations, status: 500 Internal Server Error, elapsed: 1.501371ms
lighthouse[224353]: Dec 16 23:23:25.000 INFO Synced                                  slot: 205016, block:    …  empty, epoch: 6406, finalized_epoch: 6404, finalized_root: 0x6fa5…4666, peers: 514, service: slot_notifier

Version

$ lighthouse --version
Lighthouse v1.0.4-1abc70e
BLS Library: blst
Specs: mainnet (true), minimal (false), v0.12.3 (false)
$ CARGO_HOME=/opt/rust RUSTUP_HOME=/opt/rust RUSTUP_TOOLCHAIN=stable /opt/rust/bin/rustc --version
rustc 1.48.0 (7eac88abb 2020-11-16)

Present Behaviour

Lighthouse Beacon Node stops any network activity and doesn't submit attestations.

Expected Behaviour

Lighthouse Beacon Node performs network activity and submits attestations.

Steps to resolve

We simply restarted the node.

@realbigsean
Copy link
Member

realbigsean commented Dec 17, 2020

Seems related to this issue: #1067

peers: 515

I'd guess it's got something to do with the very high peer count

@AgeManning
Copy link
Member

Yes. The previous issue was in the delay queues managing the ping and meta-data intervals. There must still be an issue in tokio when these are pushed with very high peer counts.

It looks to me like it is currently being addressed in tokio by this PR: tokio-rs/tokio#3270

Hopefully we can update tokio once that gets merged to resolve this issue. It seems to be me its currently probabilistic in hitting specific delay timeouts and that probability is exasperated by large peer counts.

An interim solution would be to lower the peer count to reduce the chances of hitting this. In the meantime, I think we need to wait for the upstream fix.

@paulhauner
Copy link
Member

paulhauner commented Jan 21, 2021

The proposed fix at tokio-rs/tokio#3270 has been merged into tokio master so I assume we'll see a release soon.

@paulhauner paulhauner added the bug Something isn't working label Jan 21, 2021
@paulhauner
Copy link
Member

It seems tokio_util was updated to 0.6.2 and that contains tokio-rs/tokio#3270.

bors bot pushed a commit that referenced this issue Feb 10, 2021
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  sigp/discv5#58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> tokio-rs/tokio#3028
- `try_recv` has been temporarily removed from `mpsc` -> tokio-rs/tokio#3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `tokio-rs/tokio#2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged tokio-rs/tokio#3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
@paulhauner
Copy link
Member

Hopefully resolved in #2172, which includes tokio-util 0.6.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants