Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: panic during shutdown with TokioCurrentThread batch tracer #1963

Closed
howardjohn opened this issue Jul 24, 2024 · 3 comments · Fixed by #1964
Closed

[Bug]: panic during shutdown with TokioCurrentThread batch tracer #1963

howardjohn opened this issue Jul 24, 2024 · 3 comments · Fixed by #1964
Labels
bug Something isn't working triage:todo Needs to be traiged.

Comments

@howardjohn
Copy link
Contributor

What happened?

thread '<unnamed>' panicked at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.1/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Cargo.toml:

[package]
name = "repro"
version = "0.1.0"
edition = "2021"

[dependencies]
opentelemetry = "0.24.0"
opentelemetry-otlp = "0.17.0"
opentelemetry_sdk = { version = "0.24.1", features = ["rt-tokio-current-thread"] }
tokio = { version = "1.39.1", features = ["full"] }
tracing = "0.1.40"

main.rs:

extern crate core;

use opentelemetry::global;
use opentelemetry::trace::Tracer;
use opentelemetry_otlp::WithExportConfig;
use opentelemetry_sdk::Resource;
use std::time::Duration;

#[tokio::main]
async fn main() {
    let result = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_trace_config(
            opentelemetry_sdk::trace::Config::default().with_resource(Resource::new(vec![])),
        )
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint("http://127.0.0.1:4317"),
        )
        // .install_simple();
        .install_batch(opentelemetry_sdk::runtime::TokioCurrentThread);
    let r = result.unwrap();
    global::set_tracer_provider(r.clone());

    let tracer = global::tracer("my_tracer");
    tracer.in_span("doing_work", |_cx| {
        tracing::error!("hello world");
    });

    tokio::time::sleep(Duration::from_secs(1)).await;
}
$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/repro`
thread '<unnamed>' panicked at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.1/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

RUST_BACKTRACE doesn't work for this crash for some reason, not sure why. If you run in a debugger you can see the stack:

tokio::runtime::time::entry::TimerEntry::poll_elapsed entry.rs:569
tokio::time::sleep::Sleep::poll_elapsed sleep.rs:415
<tokio::time::sleep::Sleep as core::future::future::Future>::poll sleep.rs:447
<core::pin::Pin<P> as core::future::future::Future>::poll future.rs:123
tokio::time::interval::Interval::poll_tick interval.rs:465
<tokio_stream::wrappers::interval::IntervalStream as futures_core::stream::Stream>::poll_next interval.rs:32
<futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next map.rs:58
futures_util::stream::select_with_strategy::poll_side select_with_strategy.rs:220
futures_util::stream::select_with_strategy::poll_inner select_with_strategy.rs:234
<futures_util::stream::select_with_strategy::SelectWithStrategy<St1,St2,Clos,State> as futures_core::stream::Stream>::poll_next select_with_strategy.rs:270
<futures_util::stream::select::Select<St1,St2> as futures_core::stream::Stream>::poll_next select.rs:115
<core::pin::Pin<P> as futures_core::stream::Stream>::poll_next stream.rs:120
futures_util::stream::stream::StreamExt::poll_next_unpin mod.rs:1638
<futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll next.rs:32
<core::pin::Pin<P> as core::future::future::Future>::poll future.rs:123
futures_util::future::future::FutureExt::poll_unpin mod.rs:558
opentelemetry_sdk::trace::span_processor::BatchSpanProcessorInternal<R>::run::{{closure}}::{{closure}}::{{closure}} select_mod.rs:321
core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut function.rs:294
opentelemetry_sdk::trace::span_processor::BatchSpanProcessorInternal<R>::run::{{closure}}::{{closure}} select_mod.rs:321
<futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll poll_fn.rs:56
opentelemetry_sdk::trace::span_processor::BatchSpanProcessorInternal<R>::run::{{closure}} span_processor.rs:427
<core::pin::Pin<P> as core::future::future::Future>::poll future.rs:123
<core::pin::Pin<P> as core::future::future::Future>::poll future.rs:123
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}} mod.rs:673
[Inlined] tokio::runtime::coop::with_budget coop.rs:107
[Inlined] tokio::runtime::coop::budget coop.rs:73
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}} mod.rs:673
tokio::runtime::scheduler::current_thread::Context::enter mod.rs:412
tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}} mod.rs:672
tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}} mod.rs:751
tokio::runtime::context::scoped::Scoped<T>::set scoped.rs:40
tokio::runtime::context::set_scheduler::{{closure}} context.rs:180
std::thread::local::LocalKey<T>::try_with local.rs:284
std::thread::local::LocalKey<T>::with local.rs:260
tokio::runtime::context::set_scheduler context.rs:180
tokio::runtime::scheduler::current_thread::CoreGuard::enter mod.rs:751
tokio::runtime::scheduler::current_thread::CoreGuard::block_on mod.rs:660
tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}} mod.rs:180
tokio::runtime::context::runtime::enter_runtime runtime.rs:65
tokio::runtime::scheduler::current_thread::CurrentThread::block_on mod.rs:168
tokio::runtime::runtime::Runtime::block_on_inner runtime.rs:361
tokio::runtime::runtime::Runtime::block_on runtime.rs:335
<opentelemetry_sdk::runtime::TokioCurrentThread as opentelemetry_sdk::runtime::Runtime>::spawn::{{closure}} runtime.rs:101
std::sys_common::backtrace::__rust_begin_short_backtrace backtrace.rs:155
std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} mod.rs:528
<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once unwind_safe.rs:272
std::panicking::try::do_call panicking.rs:552
__rust_try 0x00005ff9af070c5b
std::panicking::try panicking.rs:516
[Inlined] std::panic::catch_unwind panic.rs:146
std::thread::Builder::spawn_unchecked_::{{closure}} mod.rs:527
core::ops::function::FnOnce::call_once{{vtable.shim}} function.rs:250
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:2020
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:2020
std::sys::pal::unix::thread::Thread::new::thread_start thread.rs:108
<unknown> 0x0000713c4b37dded
<unknown> 0x0000713c4b4010dc

API Version

0.24.0

SDK Version

0.24.1

What Exporter(s) are you seeing the problem on?

No response

Relevant log output

thread '<unnamed>' panicked at /home/john/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.1/src/runtime/time/entry.rs:568:9:
A Tokio 1.x context was found, but it is being shutdown.
@howardjohn howardjohn added bug Something isn't working triage:todo Needs to be traiged. labels Jul 24, 2024
@howardjohn
Copy link
Contributor Author

Note: we don't use the multi-threaded runtime due to #868. Ill do some more investigation into if that has been resolved though, as that is a bit stale

@howardjohn
Copy link
Contributor Author

howardjohn commented Jul 24, 2024

I believe the root cause here is we build a ticker tied to the outer runtime:

        let ticker = runtime
            .interval(config.scheduled_delay)
            .map(|_| BatchMessage::Flush(None));

This timer actually refers to the outer runtime, but we poll it from a different runtime.

I will attempt a fix

howardjohn added a commit to howardjohn/opentelemetry-rust that referenced this issue Jul 24, 2024
howardjohn added a commit to howardjohn/opentelemetry-rust that referenced this issue Jul 24, 2024
@cijothomas
Copy link
Member

@howardjohn Thanks for reporting the issue along with a clean repro and the fix! Really appreciate it.

One thing - you should still call shutdown() at the end to ensure telemetry is flushed out
https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/examples/basic-otlp/src/main.rs#L154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage:todo Needs to be traiged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants