Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in libp2p-websocket at 'SinkImpl::poll_ready called after error.' #5471

Open
hrxi opened this issue Jun 17, 2024 · 4 comments
Open

Panic in libp2p-websocket at 'SinkImpl::poll_ready called after error.' #5471

hrxi opened this issue Jun 17, 2024 · 4 comments

Comments

@hrxi
Copy link
Contributor

hrxi commented Jun 17, 2024

CC #2598
CC nimiq/core-rs-albatross#732

2024-06-10T15:15:27.806153220Z ERROR panic                | thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.': /home/nimiq/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libp2p-websocket-0.43.0/src/quicksink.rs:159
    0: <backtrace::capture::Backtrace as core::default::Default>::default
    1: log_panics::Config::install_panic_hook::{{closure}}
    2: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2021:9
       std::panicking::rust_panic_with_hook
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:735:13
    3: std::panicking::begin_panic_handler::{{closure}}
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:601:13
    4: std::sys_common::backtrace::__rust_end_short_backtrace
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys_common/backtrace.rs:170:18
    5: rust_begin_unwind
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/panicking.rs:597:5
    6: core::panicking::panic_fmt
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/core/src/panicking.rs:72:14
    7: <libp2p_websocket::quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink<A>>::poll_ready
    8: <rw_stream_sink::RwStreamSink<S> as futures_io::if_std::AsyncWrite>::poll_write
    9: <asynchronous_codec::framed::Framed<T,U> as futures_sink::Sink<<U as asynchronous_codec::encoder::Encoder>::Item>>::poll_ready
   10: <libp2p_noise::io::Output<T> as futures_io::if_std::AsyncWrite>::poll_write
   11: <yamux::frame::io::Io<T> as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
   12: yamux::connection::Connection<T>::poll_next_inbound
   13: libp2p_yamux::Muxer<C>::poll_inner
   14: <libp2p_yamux::Muxer<C> as libp2p_core::muxing::StreamMuxer>::poll
   15: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
   16: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
   17: libp2p_swarm::connection::Connection<THandler>::poll
   18: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
   19: tokio::runtime::task::core::Core<T,S>::poll
   20: tokio::runtime::task::raw::poll
   21: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   22: tokio::runtime::scheduler::multi_thread::worker::Context::run
   23: tokio::runtime::context::scoped::Scoped<T>::set
   24: tokio::runtime::context::runtime::enter_runtime
   25: tokio::runtime::scheduler::multi_thread::worker::run
   26: tokio::runtime::task::core::Core<T,S>::poll
   27: tokio::runtime::task::harness::Harness<T,S>::poll
   28: tokio::runtime::blocking::pool::Inner::run
   29: std::sys_common::backtrace::__rust_begin_short_backtrace
   30: core::ops::function::FnOnce::call_once{{vtable.shim}}
   31: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2007:9
       <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/alloc/src/boxed.rs:2007:9
       std::sys::unix::thread::Thread::new::thread_start
              at /rustc/a28077b28a02b92985b3a3faecf92813155f1ea1/library/std/src/sys/unix/thread.rs:108:17
   32: <unknown>
   33: <unknown>

I again don't have a reproducer, but it seems likely to be fixable without reproduction.

libp2p 0.53.2
libp2p-websocket 0.43.0
libp2p-yamux 0.45.1
yamux 0.12.1
yamux 0.13.2
@dariusc93
Copy link
Member

could you provide your configuration on how youre using yamux or are you using the default configuration? Any logs you could provide up to the panic?

@Eligioo
Copy link
Contributor

Eligioo commented Jun 17, 2024

@dariusc93
For Yamux, we use the default configuration (yamux::Config::default())
And the transport for the swarm is constructed as following:

transport
    .upgrade(core::upgrade::Version::V1)
    .authenticate(noise::Config::new(keypair).unwrap())
    .multiplex(yamux)
    .timeout(std::time::Duration::from_secs(20))
    .boxed()

Unfortunately we don't have more relevant logs at the moment.

@Eligioo
Copy link
Contributor

Eligioo commented Jun 18, 2024

@dariusc93 I checked the logs of our nodes around the multiple times we noticed the panics and checked what our other peers, that didn't panic, were doing around those times. What I noticed is that the once that didn't panic, closed the connection due to an error (see below).

The direction I'm currently looking at is the moment is that the publish fails, the Connection<THandler> gets polled again and starts processing the muxing.poll_unpin for a ConnectionEvent::AddressChange, which shouldn't happen because the connection already errored. But this isn't backed by anything because I'm not seeing failed publishes anywhere.

Non-panicked peer:

Jun 10 15:15:26: 2024-06-10T15:15:26.676079329Z DEBUG block_queue          | Broadcasting on BlockHeaderTopic block="#6668451:MI:e88944e85d"
Jun 10 15:15:27: 2024-06-10T15:15:27.243774270Z INFO  network              | Connection closed with peer connection_id=2165980 peer_id=12D3KooWHEyf9LRTEwDXkXWyZGz9T1pv9ciHC1V2wXiGbjNbjhJe endpoint=Listener { local_addr: "/ip4/46.101.171.193/tcp/8443/wss", send_back_addr: "/ip4/168.119.103.198/tcp/33580/wss" } connections=0
Jun 10 15:15:27: 2024-06-10T15:15:27.243845529Z INFO  network              | Connection closed because cause=Connection error: I/O error: connection is closed

A panicked peers were also broadcasting to the BlockHeaderTopic gossipsub topic just before.

@bkchr
Copy link

bkchr commented Jul 3, 2024

@jxs can you help here? We are experiencing the same: paritytech/polkadot-sdk#4934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants