Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet_listener thread exit abnormally #166

Closed
iBuddha opened this issue Dec 5, 2024 · 2 comments · Fixed by #167
Closed

packet_listener thread exit abnormally #166

iBuddha opened this issue Dec 5, 2024 · 2 comments · Fixed by #167

Comments

@iBuddha
Copy link

iBuddha commented Dec 5, 2024

I'm using DataFusion and hdfs-native-object-store. When DataFusion's target partitions is set to above 1, the sql like "select * from xxx limit 100" would cause tokio threads exit abnormally:

thread 'tokio-runtime-worker' panicked at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hdfs-native-0.10.3/src/hdfs/block_reader.rs:243:55:
called `Result::unwrap()` on an `Err` value: SendError { .. }
stack backtrace:
   0:        0x11213e896 - std::backtrace_rs::backtrace::libunwind::trace::hc4ed1a64850536ff
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:        0x11213e896 - std::backtrace_rs::backtrace::trace_unsynchronized::hb5186e43ac69528f
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x11213e896 - std::sys::backtrace::_print_fmt::h7fc6a14ac3a0a722
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:66:9
   3:        0x11213e896 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h0fd572ca60ee9a4e
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:39:26
   4:        0x1121613b3 - core::fmt::rt::Argument::fmt::hdbc762bbbe87f170
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/rt.rs:177:76
   5:        0x1121613b3 - core::fmt::write::hbc078725bba6692a
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/fmt/mod.rs:1186:21
   6:        0x11213b272 - std::io::Write::write_fmt::h100f2ae009a3df53
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/io/mod.rs:1839:15
   7:        0x11213e6d2 - std::sys::backtrace::BacktraceLock::print::hf0d5a155265a9dd5
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:42:9
   8:        0x11213f712 - std::panicking::default_hook::{{closure}}::h5499fb85b118791b
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:268:22
   9:        0x11213f55c - std::panicking::default_hook::h3be9b9b36bd75e8f
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:295:9
  10:        0x11213ff27 - std::panicking::rust_panic_with_hook::h10014b4a7f4c072b
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:801:13
  11:        0x11213fbd8 - std::panicking::begin_panic_handler::{{closure}}::hc871510d12acad65
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:674:13
  12:        0x11213ed79 - std::sys::backtrace::__rust_end_short_backtrace::hca0d49bc0c1e56d3
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:170:18
  13:        0x11213f81c - rust_begin_unwind
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
  14:        0x11229a8cf - core::panicking::panic_fmt::haa8e13f18984c8e5
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
  15:        0x11229ade5 - core::result::unwrap_failed::hfdf6e863ce0640aa
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/result.rs:1700:5
  16:        0x10e376ecd - core::result::Result<T,E>::unwrap::h94250bf84e7382f2
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1104:23
  17:        0x10e376ecd - hdfs_native::hdfs::block_reader::ReplicatedBlockStream::start_packet_listener::{{closure}}::hf43f980c93783595
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hdfs-native-0.10.3/src/hdfs/block_reader.rs:243:17
  18:        0x10e2f9a8a - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::hea89db671a4e99d9
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/core.rs:331:17
  19:        0x10e2f7df3 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h822aee1224c4cc94
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/loom/std/unsafe_cell.rs:16:9
  20:        0x10e2f7df3 - tokio::runtime::task::core::Core<T,S>::poll::hadc0a9b14599e6ad
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/core.rs:320:13
  21:        0x10e31ec85 - tokio::runtime::task::harness::poll_future::{{closure}}::h300de3eea72cb95a
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:499:19
  22:        0x10e3149db - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h95c79b121257ea66
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  23:        0x10e268f19 - std::panicking::try::do_call::h1202315522359266
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  24:        0x10e3406dd - ___rust_try
  25:        0x10e335bb7 - std::panicking::try::h522c8e9d49417c0d
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  26:        0x10e335bb7 - std::panic::catch_unwind::hcdfb373817d51399
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  27:        0x10e31a9d3 - tokio::runtime::task::harness::poll_future::h19bc0832b41cd0b3
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:487:18
  28:        0x10e3218c6 - tokio::runtime::task::harness::Harness<T,S>::poll_inner::ha86a582e56e60667
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:209:27
  29:        0x10e326615 - tokio::runtime::task::harness::Harness<T,S>::poll::h23a89db44e0585d2
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:154:15
  30:        0x10e33c92d - tokio::runtime::task::raw::poll::hee5e70cb653b875c
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/raw.rs:271:5
  31:        0x110bd91f6 - tokio::runtime::task::raw::RawTask::poll::h648f0a465026c57f
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/raw.rs:201:18
  32:        0x110b7d212 - tokio::runtime::task::LocalNotified<S>::run::h2294911204ebbcb5
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/mod.rs:435:9
  33:        0x110ba2fb9 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}::h020970b16f718208
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:596:13
  34:        0x110ba2e7b - tokio::runtime::coop::with_budget::h0f73542c31de7261
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:107:5
  35:        0x110ba2e7b - tokio::runtime::coop::budget::he670b467e984b399
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/coop.rs:73:5
  36:        0x110ba2e7b - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::hdf4ae45dce6ebba0
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:595:9
  37:        0x110ba2712 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h9d203a0e1434c067
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:546:24
  38:        0x110ba2420 - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}::hdc058a0532907eaf
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:511:21
  39:        0x110b9cb45 - tokio::runtime::context::scoped::Scoped<T>::set::h913710046e8ef186
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/scoped.rs:40:9
  40:        0x110b7f5db - tokio::runtime::context::set_scheduler::{{closure}}::hb1b7ea56f12b0dcf
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:26
  41:        0x110bcdb79 - std::thread::local::LocalKey<T>::try_with::h368f22f4ee5eb682
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:283:12
  42:        0x110bcd5a1 - std::thread::local::LocalKey<T>::with::h307fa1549b0cf535
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:260:9
  43:        0x110b7f550 - tokio::runtime::context::set_scheduler::he68f9f276833bda2
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context.rs:180:9
  44:        0x110ba2353 - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::h6013a88d4ef68a63
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:506:9
  45:        0x110b9c63c - tokio::runtime::context::runtime::enter_runtime::heda6c5987ff8b174
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/context/runtime.rs:65:16
  46:        0x110ba214b - tokio::runtime::scheduler::multi_thread::worker::run::h6939e274170ecc5d
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:498:5
  47:        0x110ba1f11 - tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}::he8108a0757ba967a
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/scheduler/multi_thread/worker.rs:464:45
  48:        0x110bdd146 - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h2f41d8d659f79397
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/blocking/task.rs:42:21
  49:        0x110b82b35 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h7834147d3e0d7443
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/core.rs:331:17
  50:        0x110b827b8 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hd19b00335ec734e0
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/loom/std/unsafe_cell.rs:16:9
  51:        0x110b827b8 - tokio::runtime::task::core::Core<T,S>::poll::h414d73523d34644f
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/core.rs:320:13
  52:        0x110b69817 - tokio::runtime::task::harness::poll_future::{{closure}}::ha1a11a89c25b384d
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:499:19
  53:        0x110b784c2 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hd0a78b7184ec6648
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  54:        0x110bbefe6 - std::panicking::try::do_call::hafcdb5482d40b1e1
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  55:        0x110b97b3d - ___rust_try
  56:        0x110b96e92 - std::panicking::try::h33e3a6c9a6b3e125
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  57:        0x110b96e92 - std::panic::catch_unwind::hf8c999bf62cf9b49
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  58:        0x110b68e23 - tokio::runtime::task::harness::poll_future::h53e33a4277da1c96
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:487:18
  59:        0x110b67116 - tokio::runtime::task::harness::Harness<T,S>::poll_inner::hd2dcc0e17bb9acf0
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:209:27
  60:        0x110b66cb5 - tokio::runtime::task::harness::Harness<T,S>::poll::hadabd8b5b19bc0c8
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/harness.rs:154:15
  61:        0x110bd94ad - tokio::runtime::task::raw::poll::h3663f55fde5e9e73
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/raw.rs:271:5
  62:        0x110bd91f6 - tokio::runtime::task::raw::RawTask::poll::h648f0a465026c57f
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/raw.rs:201:18
  63:        0x110b7d2be - tokio::runtime::task::UnownedTask<S>::run::h014f27de6d041177
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/task/mod.rs:472:9
  64:        0x110bd9c59 - tokio::runtime::blocking::pool::Task::run::hae8ca8a05eac2517
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/blocking/pool.rs:161:9
  65:        0x110bdc7d9 - tokio::runtime::blocking::pool::Inner::run::h957fa0088e1080e3
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/blocking/pool.rs:511:17
  66:        0x110bdc576 - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::hbb47be995877d381
                               at /Users/xing.huang/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.42.0/src/runtime/blocking/pool.rs:469:13
  67:        0x110bbeb3d - std::sys::backtrace::__rust_begin_short_backtrace::hacdb09cda89133ce
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:154:18
  68:        0x110b99cc0 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h3892a2f7c88b1c31
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/mod.rs:538:17
  69:        0x110b78390 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h7366fedc2e818054
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272:9
  70:        0x110bbf320 - std::panicking::try::do_call::hf9967fc2fc95cdd7
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40
  71:        0x110b9e1cd - ___rust_try
  72:        0x110b99af6 - std::panicking::try::he3c1ed2da4f92a42
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19
  73:        0x110b99af6 - std::panic::catch_unwind::h0a2f3426498829f4
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:358:14
  74:        0x110b99af6 - std::thread::Builder::spawn_unchecked_::{{closure}}::h5f6e072f7a44efd5
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/mod.rs:537:30
  75:        0x110bc3c21 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hebadc0b7c77020f8
                               at /Users/xing.huang/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
  76:        0x11214330b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8ebb884f876fafe1
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/boxed.rs:2454:9
  77:        0x11214330b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h87d3fdd28d9c67cc
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/boxed.rs:2454:9
  78:        0x11214330b - std::sys::pal::unix::thread::Thread::new::thread_start::h08338ace24ce5652
                               at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/pal/unix/thread.rs:105:17
  79:     0x7ff817730253 - __pthread_start

It seems that DataFusion concurrently reads many different parts of the queried file. Since it's a "limit 100", when 100 records is returned, DataFusion seems will closes its readers, then the packet_listener thread would failed to send its message and failed.

    fn start_packet_listener(
        mut connection: DatanodeConnection,
        checksum_info: Option<ReadOpChecksumInfoProto>,
        sender: Sender<Result<(PacketHeaderProto, Bytes)>>,
    ) -> JoinHandle<Result<DatanodeConnection>> {
        tokio::spawn(async move {
            loop {
                let packet = connection.read_packet().await?;
                let header = packet.header.clone();
                let data = packet.get_data(&checksum_info)?;

                // If the packet is empty it means it's the last packet
                // so tell the DataNode the read was a success and finish this task
                if data.is_empty() {
                    connection.send_read_success().await?;
                    break;
                }

                sender.send(Ok((header, data))).await.unwrap();
            }
            Ok(connection)
        })
    }

Maybe it's better to check if sender is closed when things goes wrong instead of 'unwrap' ?

sender.send(Ok((header, data))).await.unwrap();
@Kimahriman
Copy link
Owner

Hmm I didn't think about a partially consumed read being a normal use-case. I can update to handle that a little better. I definitely have too many unwraps in general 😅

@iBuddha
Copy link
Author

iBuddha commented Dec 5, 2024

hdfs-native really helped us a lot, thanks for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants