Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Panic in libp2p after ~24 hours #18

Closed
chevdor opened this issue Sep 12, 2018 · 15 comments
Closed

Panic in libp2p after ~24 hours #18

chevdor opened this issue Sep 12, 2018 · 15 comments

Comments

@chevdor
Copy link
Contributor

chevdor commented Sep 12, 2018

Version 0.2.15-3720d74-x86_64-linux-gnu
The node crashed after ~24 hours

Node start logs
2018-09-11 07:13:17 Parity ·:· Polkadot
2018-09-11 07:13:17   version 0.2.15-3720d74-x86_64-linux-gnu
2018-09-11 07:13:17   by Parity Technologies, 2017, 2018
2018-09-11 07:13:17 Chain specification: Krumme Lanke
2018-09-11 07:13:17 Node name: Acid Burn
2018-09-11 07:13:17 Roles: AUTHORITY
2018-09-11 07:13:17 Best block: #1323709
Node crash logs
2018-09-12 06:16:26 BFT agreement error: Message sender 3d5db678afac4a72b36a25f998732911d854238d0071667c5993f5edc903116f is not a valid authority.
2018-09-12 06:16:27 BFT agreement error: Message sender 3d5db678afac4a72b36a25f998732911d854238d0071667c5993f5edc903116f is not a valid authority.

====================

stack backtrace:
   0:     0x55935fe8db2c - backtrace::backtrace::trace::ha18aa6ab54e5a876
   1:     0x55935fe8cb02 - <backtrace::capture::Backtrace as core::default::Default>::default::h4b1e8887c11facf6
   2:     0x55935fe8cb78 - backtrace::capture::Backtrace::new::h9650079cf787b5a3
   3:     0x55935f74f760 - substrate_cli::panic_hook::panic_hook::h3b42c0eac81fca14
   4:     0x55935f74f478 - core::ops::function::Fn::call::hdad2bc9320d93013
   5:     0x55935ffbe883 - std::panicking::rust_panic_with_hook::he4c3a67f6258a8f9
                        at libstd/panicking.rs:515
   6:     0x55935faa5406 - std::panicking::begin_panic::h43f6d7d030843f98
   7:     0x55935fab07a1 - ring::agreement::agree_ephemeral::h8f671873f8174a10
   8:     0x55935fb44bc4 - <futures::future::chain::Chain<A, B, C>>::poll::h165aff8d34002d8a
   9:     0x55935fb64dcb - <futures::future::chain::Chain<A, B, C>>::poll::h95f05b4b843d58f8
  10:     0x55935fb6f538 - <futures::future::chain::Chain<A, B, C>>::poll::hc5503a98c6150948
  11:     0x55935faa35e8 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h415fbfb3c2449ef1
  12:     0x55935fbed30b - <futures::future::map::Map<A, F> as futures::future::Future>::poll::headc5df01320a280
  13:     0x55935fbed69b - <futures::future::map::Map<A, F> as futures::future::Future>::poll::hedd70b616c3d397c
  14:     0x55935fc3dc6b - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::hcfeab84f7b6d577e
  15:     0x55935fbe7906 - <futures::future::map::Map<A, F> as futures::future::Future>::poll::h57f0ef49a333020a
  16:     0x55935fc157ba - <libp2p_core::upgrade::apply::UpgradeApplyFuture<C, U, Maf> as futures::future::Future>::poll::h15d03021237b1f85
  17:     0x55935fb6b051 - <futures::future::chain::Chain<A, B, C>>::poll::hb48188e680ae1193
  18:     0x55935faa3668 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h8a2a227adfbf4679
  19:     0x55935fb5683d - <futures::future::chain::Chain<A, B, C>>::poll::h5457174e39ff311b
  20:     0x55935faa3588 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h0f73a8cc10531f29
  21:     0x55935fb4519a - <futures::future::chain::Chain<A, B, C>>::poll::h185c4fcf25108a88
  22:     0x55935faa3638 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h7a2e8626dfaed67e
  23:     0x55935fc55586 - <libp2p_core::connection_reuse::ConnectionReuseDial<T, D, M> as futures::future::Future>::poll::h275e27db89f86091
  24:     0x55935fbe9140 - <futures::future::map::Map<A, F> as futures::future::Future>::poll::h983cf9280cce7688
  25:     0x55935faef203 - <libp2p_transport_timeout::TokioTimerMapErr<InnerFut> as futures::future::Future>::poll::hdc4df65e936842d3
  26:     0x55935fbe7560 - <futures::future::map::Map<A, F> as futures::future::Future>::poll::h57969ae0efa3f797
  27:     0x55935fbf77e8 - <futures::future::map::Map<A, F> as futures::future::Future>::poll::hee8d79230727f90e
  28:     0x55935fc3bf3f - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::h702d76dca6d894d9
  29:     0x55935fb5a593 - <futures::future::chain::Chain<A, B, C>>::poll::h65533c7600372e53
  30:     0x55935faa35b8 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h2db59248e1acf7d9
  31:     0x55935fb63037 - <futures::future::chain::Chain<A, B, C>>::poll::h8c4485741198a223
  32:     0x55935faa35c8 - <futures::future::and_then::AndThen<A, B, F> as futures::future::Future>::poll::h3e11973aa0c5e8f0
  33:     0x55935fb5dfdf - <futures::future::chain::Chain<A, B, C>>::poll::h7e046121850794aa
  34:     0x55935fbf6758 - <futures::future::then::Then<A, B, F> as futures::future::Future>::poll::h9f97db5d9cb80b3b
  35:     0x55935faa2229 - <libp2p_core::swarm::SwarmEvents<T, F, H> as futures::stream::Stream>::poll::hd798e4e2b8003735
  36:     0x55935fbcf490 - <futures::stream::for_each::ForEach<S, F, U> as futures::future::Future>::poll::hf56650476d71b6a6
  37:     0x55935fbf5dee - <futures::future::select_all::SelectAll<A> as futures::future::Future>::poll::h87e972ae0393706d
  38:     0x55935fb60c2c - <futures::future::chain::Chain<A, B, C>>::poll::h87c8d2c6d4bce5ff
  39:     0x55935fc3c49e - <futures::future::map_err::MapErr<A, F> as futures::future::Future>::poll::h77675111afe2a287
  40:     0x55935fc65381 - futures::task_impl::std::set::h41d1a2e71aefbe57
  41:     0x55935fc5bcb7 - <std::thread::local::LocalKey<T>>::with::h434807c0a17c731f
  42:     0x55935faea286 - <tokio::executor::current_thread::Entered<'a, P>>::block_on::h22a55d6f12a791de
  43:     0x55935fc5c3e9 - <std::thread::local::LocalKey<T>>::with::ha2ab375c862d8531
  44:     0x55935fc5cac9 - <std::thread::local::LocalKey<T>>::with::hede92d2d25fd28a5
  45:     0x55935fc5c739 - <std::thread::local::LocalKey<T>>::with::he3bb956a37fd1d50
  46:     0x55935fc5c110 - <std::thread::local::LocalKey<T>>::with::h9fce941262b48e2d
  47:     0x55935fc2e1fd - tokio::runtime::current_thread::runtime::Runtime::block_on::h725d4e9adacda691
  48:     0x55935fbafcf1 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6aee95d9b4aaa435
  49:     0x55935faa5434 - std::panicking::try::do_call::hf360a7f1f04b7e62
  50:     0x55935ffe1c49 - __rust_maybe_catch_panic
                        at libpanic_unwind/lib.rs:105
  51:     0x55935fbe0fc3 - <F as alloc::boxed::FnBox<A>>::call_box::he287d19513816397
  52:     0x55935ffbbe6a - <alloc::boxed::Box<alloc::boxed::FnBox<A, Output$u3d$R$GT$$u20$$u2b$$u20$$u27$a$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h75e539106a648d39
                        at /checkout/src/liballoc/boxed.rs:650
                         - std::sys_common::thread::start_thread::h88a639c99862a9f5
                        at libstd/sys_common/thread.rs:24
  53:     0x55935ffbf3f5 - std::sys::unix::thread::Thread::new::thread_start::h7d7a420a78cfa84d
                        at libstd/sys/unix/thread.rs:90
  54:     0x7f92f530f6b9 - start_thread
  55:     0x7f92f4e2f41c - clone
  56:                0x0 - <unknown>

Thread '<unnamed>' panicked at 'explicit panic', /root/.cargo/git/checkouts/rust-libp2p-98135dbcf5b63918/304e9c7/protocols/secio/src/handshake.rs:465

This is a bug. Please report it at:

    https://github.com/paritytech/polkadot/issues/new
@chevdor chevdor changed the title Polkadot crash Panic in libp2p after ~24 hours Sep 12, 2018
@tomaka
Copy link
Contributor

tomaka commented Sep 12, 2018

This panic can happen if the SHA-256 hash of the local public key plus the remote nonce is the same as the SHA-256 hash of the remote public key plus the local nonce.
The nonce is 16 bytes and is randomly generated at every handshake.

In other words, this is very very unlikely, as in "will never happen in the history of the Universe likely".

@chevdor
Copy link
Contributor Author

chevdor commented Sep 12, 2018 via email

@chevdor
Copy link
Contributor Author

chevdor commented Sep 13, 2018

More seriously, I will keep monitoring for that one. If no feedback after a reasonable amount of time, we will close it. If the issue happens again, I will start guessing BTC addresses with this machine :)

@chevdor
Copy link
Contributor Author

chevdor commented Sep 13, 2018

I ran into this issue 3 times today.

@tomaka
Copy link
Contributor

tomaka commented Sep 13, 2018

Is it possible that you have the same network public key (the secret file in chains/krummelanke/network) as another node in the network (for example, you copy-pasted it between several of your nodes)?

It still shouldn't panic even if it is the case, but knowing that may help pin-point the exact problem.

@tomaka
Copy link
Contributor

tomaka commented Sep 14, 2018

libp2p/rust-libp2p#480 should fix the issue. It will be pulled into substrate soon-ish.

@chevdor
Copy link
Contributor Author

chevdor commented Sep 14, 2018

Is it possible that you have the same network public key (the secret file in chains/krummelanke/network) as another node in the network (for example, you copy-pasted it between several of your nodes)?

I don´t think so but I can check if you still need. Do you ?

@dbpatty
Copy link

dbpatty commented Sep 16, 2018

I think to have the same problem...

@dbpatty
Copy link

dbpatty commented Sep 17, 2018

Is there anything I can do? Right now I have this problem several times a day, so i think to create a scritpt to start and stop process once an hour, is it a good solution for now?

@tomaka
Copy link
Contributor

tomaka commented Sep 17, 2018

@dbpatty If you are building from source, using the latest libp2p should fix it.

Find the line that says:

libp2p = { git = "https://github.com/libp2p/rust-libp2p", rev = "<something>", default-features = false, features = ["libp2p-secio", "libp2p-secio-secp256k1"] }

And replace the <something> with 3e53a9dcc728d2e932d731bf90a8e81e0e4257ab. That should fix it.

@dbpatty
Copy link

dbpatty commented Sep 17, 2018

ok thank you I'll try
I have followed your git "read me" instructions to install our node (Polkadot Version: parity-polkadot v0.2.16), where is this line ?

@dbpatty
Copy link

dbpatty commented Sep 21, 2018

@dbpatty If you are building from source, using the latest libp2p should fix it.

Find the line that says:

libp2p = { git = "https://github.com/libp2p/rust-libp2p", rev = "<something>", default-features = false, features = ["libp2p-secio", "libp2p-secio-secp256k1"] }

And replace the <something> with 3e53a9dcc728d2e932d731bf90a8e81e0e4257ab. That should fix it.

I don't find this line, in which file is it?

@rphmeier
Copy link
Contributor

@tomaka this is fixed, yeah?

@tomaka
Copy link
Contributor

tomaka commented Dec 21, 2018

Yes!

@tomaka tomaka closed this as completed Dec 21, 2018
@tomaka
Copy link
Contributor

tomaka commented Dec 21, 2018

The cause of the underlying issue is still unknown (libp2p/rust-libp2p#479), but we replaced the panic with an error that disconnects the remote.

imstar15 referenced this issue in imstar15/polkadot Aug 25, 2021
* fix

* Update consensus/src/lib.rs

Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
imstar15 referenced this issue in imstar15/polkadot Aug 25, 2021
* fix

* Update consensus/src/lib.rs

Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants