Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in LogErrorEventFilter::globalEndLuminosityBlock #44413

Closed
iarspider opened this issue Mar 15, 2024 · 44 comments
Closed

SIGSEGV in LogErrorEventFilter::globalEndLuminosityBlock #44413

iarspider opened this issue Mar 15, 2024 · 44 comments

Comments

@iarspider
Copy link
Contributor

iarspider commented Mar 15, 2024

In CMSSW_14_1_X_2024-03-14-2300 IB for el9_amd64_gcc12, two RelVals 140.009, 140.021:

Thread 3 (Thread 0x149160109640 (LWP 4137850) "cmsRun"):
#0  0x00001491b35dc6ff in poll () from /lib64/libc.so.6
#1  0x00001491af821a9f in full_read.constprop () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00001491af7d60ac in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x00001491af7d6230 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  std::local_Rb_tree_rotate_left (__root=@0x14915d749f78: 0x1490958006c0, __x=0x149092f17d00) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6  std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x14908d1d80c0, __p=<optimized out>, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7  0x00001491574ca785 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#8  0x00001491574ca9c2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#9  0x00001491b555dbfb in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x00001491b5555270 in edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00001491b54a825f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00001491b54950cb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#13 0x00001491b545b7de in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#14 0x00001491b3aaa91b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x1490b7516400, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#15 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#16 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#17 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/market.cpp:599
#18 0x00001491b3aacace in tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#19 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#20 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#21 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Full stack trace ``` A fatal system signal has occurred: segmentation violation The following is the call stack containing the origin of the signal.

Fri Mar 15 07:11:08 CET 2024
Thread 8 (Thread 0x149120fff640 (LWP 4137871) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x1491217ff640 (LWP 4137870) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x149121ed9640 (LWP 4137869) "cmsRun"):
#0 0x00001491b353639a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1 0x00001491b3538ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2 0x0000149179a12cbe in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WaitForWork(Eigen::EventCount::Waiter*, tsl::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#3 0x0000149179a13223 in Eigen::ThreadPoolTempltsl::thread::EigenEnvironment::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#4 0x0000149179a10a38 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_cc.so.2
#5 0x0000149168e4f422 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libtensorflow_framework.so.2
#6 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#7 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x14915e7ff640 (LWP 4137852) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00001491b34d8e5d in syscall () from /lib64/libc.so.6
#6 0x00001491b3aacdb2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x1491b04c4024) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/semaphore.h:100
#7 tbb::detail::r1::binary_semaphore::P (this=0x1491b04c4024) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/semaphore.h:253
#8 tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x1491b04c4020) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#9 tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#10 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#11 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#12 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x14915f708640 (LWP 4137851) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00007ffc61d9cbba in clock_gettime ()
#6 0x00001491b35ad84d in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#7 0x00001491b014f2c1 in boost::chrono::thread_clock::now() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libboost_chrono.so.1.80.0
#8 0x00001491ae9f8783 in FastTimerService::Measurement::measure_and_accumulate(FastTimerService::AtomicResources&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginHLTriggerTimerPlugins.so
#9 0x00001491b54a8297 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x00001491b54950cb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00001491b545b7de in tbb::detail::d1::function_taskedm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00001491b3aaa91b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x1490b72e1b00, waiter=..., this=0x1491b2589480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#13 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::outermost_worker_waiter (t=0x0, waiter=..., this=0x1491b2589480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::arena::process (tls=..., this=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#15 tbb::detail::r1::market::process (this=, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/market.cpp:599
#16 0x00001491b3aacace in tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#17 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#18 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#19 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x149160109640 (LWP 4137850) "cmsRun"):
#0 0x00001491b35dc6ff in poll () from /lib64/libc.so.6
#1 0x00001491af821a9f in full_read.constprop () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00001491af7d60ac in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 0x00001491af7d6230 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 std::local_Rb_tree_rotate_left (__root=@0x14915d749f78: 0x1490958006c0, __x=0x149092f17d00) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6 std::_Rb_tree_insert_and_rebalance (__insert_left=, __x=0x14908d1d80c0, __p=, _header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7 0x00001491574ca785 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#8 0x00001491574ca9c2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock
(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so
#9 0x00001491b555dbfb in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#10 0x00001491b5555270 in edm::WorkerTedm::global::EDFilterBase::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00001491b54a825f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00001491b54950cb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#13 0x00001491b545b7de in tbb::detail::d1::function_taskedm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#14 0x00001491b3aaa91b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x1490b7516400, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#15 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::outermost_worker_waiter (t=0x0, waiter=..., this=0x1491b2589400) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#16 tbb::detail::r1::arena::process (tls=..., this=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:137
#17 tbb::detail::r1::market::process (this=, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/market.cpp:599
#18 0x00001491b3aacace in tbb::detail::r1::rml::private_worker::run (this=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#19 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x1491b04c4080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#20 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#21 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x149189dbe640 (LWP 4137784) "cmsRun"):
#0 0x00001491b35b230f in wait4 () from /lib64/libc.so.6
#1 0x00001491af7d3d37 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00001491af7d5fda in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 0x00001491b38784d3 in std::execute_native_thread_routine (__p=0x1491a7b09590) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#4 0x00001491b3539802 in start_thread () from /lib64/libc.so.6
#5 0x00001491b34d9314 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x1491b2e9d640 (LWP 4134141) "cmsRun"):
#0 0x00001491b35ad975 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x00001491b35b2527 in nanosleep () from /lib64/libc.so.6
#2 0x00001491b35b245e in sleep () from /lib64/libc.so.6
#3 0x00001491af7d3be0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4
#5 0x00001491b34d86bb in sched_yield () from /lib64/libc.so.6
#6 0x00001491b3ab1516 in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/x86_64-redhat-linux-gnu/bits/gthr-default.h:693
#7 std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/std_thread.h:353
#8 tbb::detail::r1::stealing_loop_backoff::pause (this=0x7ffc61d72038) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/scheduler_common.h:266
#9 tbb::detail::r1::waiter_base::pause (this=0x7ffc61d72030) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/waiters.h:35
#10 tbb::detail::r1::external_waiter::pause (this=0x7ffc61d72030) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/waiters.h:138
#11 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=, tls=..., ed=..., waiter=..., isolation=, fifo_allowed=, critical_allowed=) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:231
#12 0x00001491b3ab3342 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x1491b2589380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:350
#13 tbb::detail::r1::task_dispatcher::local_wait_for_alltbb::detail::r1::external_waiter (waiter=..., t=, this=0x1491b2589380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#14 tbb::detail::r1::task_dispatcher::execute_and_wait (t=, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#15 0x00001491b546ba0b in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#16 0x00001491b547518a in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#17 0x00001491b54756e1 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#18 0x00000000004074f5 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#19 0x00001491b3a9f96d in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:688
#20 0x0000000000408ee2 in main::{lambda()#1}::operator()() const ()
#21 0x000000000040517c in main ()

Current Modules:

Module: LogErrorEventFilter:logErrorTooManyClusters (crashed)%MSG-w BeamFitter: AlcaBeamMonitor:AlcaBeamMonitor@endLumi 15-Mar-2024 07:11:48 CET Run: 353015 Lumi: 78
No event read! No Fitting!
%MSG
%MSG-w BeamFitter: AlcaBeamMonitor:AlcaBeamMonitor@endLumi 15-Mar-2024 07:11:48 CET Run: 353015 Lumi: 77
No event read! No Fitting!
%MSG

Module: L1TStage2MuonShowerComp:l1tStage2uGMTMuonShowerVsuGMTMuonShowerCopy1
Module: none
Module: none

</details>

I didn't manage to reproduce the issue when running locally. 
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 15, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @iarspider.

@smuzaffar, @rappoccio, @Dr15Jones, @sextonkennedy, @antoniovilela, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@iarspider
Copy link
Contributor Author

assign core

The only PR that could've caused this failure is #43522

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

@wddgit

@makortel
Copy link
Contributor

Interesting the failure is specific to el9

@makortel
Copy link
Contributor

#43522 should not have impacted globalEndLuminosityBlock()...

@makortel
Copy link
Contributor

Secondary question to @cms-sw/pdmv-l2

140.009, 140.021:

Are these two workflows intentionally not processing any Events? In a way it's great to reveal problems when LuminosityBlocks don't contain any Events, but nevertheless I wonder.

@makortel
Copy link
Contributor

On 140.009 the segfault is not easily reproducible, which together with the problem appearing only one of the IB flavors suggests a threading problem of some kind.

@makortel
Copy link
Contributor

#5  std::local_Rb_tree_rotate_left
#6  std::_Rb_tree_insert_and_rebalance

in the stack trace point to either std::map or std::set. This points to either

statsPerLumi_[std::pair<uint32_t, uint32_t>(lumi.run(), lumi.luminosityBlock())] =
std::pair<size_t, size_t>(npass, nfail);

or
template <typename Collection>
void LogErrorEventFilter::increment(ErrorSet &scoreboard, Collection &list) {
for (auto const &err : list) {
std::pair<ErrorSet::iterator, bool> result = scoreboard.insert(err);
// need the const_cast as set elements are const
if (!result.second)
const_cast<unsigned int &>(result.first->count) += err.count;
}
}

(via
increment(runC->errorCollection_, lumiC->errorCollection_);

)

In all cases the modifications seem to be protected with spinlocks.

The std::set element updated after std::set::insert() looked suspicious at first, but the count is not used in the comparison operator for the ErrorSet

struct ErrorSort {
bool operator()(const Error &e1, const Error &e2) const {
if (e1.severity.getLevel() != e2.severity.getLevel())
return e1.severity.getLevel() > e2.severity.getLevel();
if (e1.module != e2.module)
return e1.module < e2.module;
if (e1.category != e2.category)
return e1.category < e2.category;
return false;
}
};
using ErrorSet = std::set<edm::ErrorSummaryEntry, ErrorSort>;

@makortel
Copy link
Contributor

Occurred in workflow 140.006 step 3 CMSSW_14_1_ROOT6_X_2024-03-15-2300 on el8_amd64_gcc12

Thread 5 (Thread 0x14c42abff700 (LWP 382910) "cmsRun"):
#3  0x000014c47d05ff40 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  std::local_Rb_tree_rotate_left (__root=@0x14c423225d78: 0x14c3760862c0, __x=0x14c3b7681680) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6  std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x14c35e3d6700, __p=<optimized out>, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7  0x000014c4239b6105 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#8  0x000014c4239b6332 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#9  0x000014c48515ea5b in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x000014c4851561a0 in edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x000014c4850a7f9f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x000014c4850967fb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 4 (Thread 0x14c42c004700 (LWP 382909) "cmsRun"):
#2  0x000014c47d05c4e0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014c48303b3b4 in ?? () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib64/libstdc++.so.6
#5  0x000014c4830e22fe in std::char_traits<char>::compare (__n=<optimized out>, __s2=0x14c3feaa9160 "PrimaryVertex-DataBase", __s1=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/char_traits.h:389
#6  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=0x14c40851f480, __s=0x14c3feaa9160 "PrimaryVertex-DataBase") at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:895
#7  0x000014c3fea969ba in AlcaBeamMonitor::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#8  0x000014c3feaa4032 in virtual thunk to edm::one::impl::LuminosityBlockCacheHolder<edm::one::EDProducerBase, alcabeammonitor::BeamSpotInfo>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#9  0x000014c48516b2c8 in edm::one::EDProducerBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 3 (Thread 0x14c42ca05700 (LWP 382908) "cmsRun"):
#2  0x000014c47d05c4e0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014c483894e08 in _mm_pause () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h:1334
#5  tbb::detail::d0::machine_pause (delay=10) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/../../include/oneapi/tbb/detail/_machine.h:97
#6  tbb::detail::d0::atomic_backoff::bounded_pause (this=<synthetic pointer>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/../../include/oneapi/tbb/detail/_utils.h:77
#7  tbb::detail::r1::prolonged_pause_impl () at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/scheduler_common.h:202
#8  tbb::detail::r1::prolonged_pause () at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/scheduler_common.h:234
#9  tbb::detail::r1::stealing_loop_backoff::pause (this=0x14c42c9fef38) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/scheduler_common.h:263
#10 tbb::detail::r1::waiter_base::pause (this=0x14c42c9fef30) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/waiters.h:35
#11 tbb::detail::r1::outermost_worker_waiter::pause (this=0x14c42c9fef30) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/waiters.h:69
#12 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::outermost_worker_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/task_dispatcher.h:231
#13 0x000014c4838a0ad2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x14c480fc9480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/task_dispatcher.h:350
#14 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x14c480fc9480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#15 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/arena.cpp:137
#16 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/market.cpp:599
#17 0x000014c4838a2b0e in tbb::detail::r1::rml::private_worker::run (this=0x14c47e247080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#18 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14c47e247080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#19 0x000014c4829df1ca in start_thread () from /lib64/libpthread.so.0
#20 0x000014c48264be73 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x14c481b66640 (LWP 382796) "cmsRun"):
#2  0x000014c47d05c4e0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007ffd63995bba in clock_gettime ()
#5  0x000014c48270658a in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x000014c47ded22c1 in boost::chrono::thread_clock::now() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/external/el8_amd64_gcc12/lib/libboost_chrono.so.1.80.0
#7  0x000014c47c4bb7b3 in FastTimerService::Measurement::measure_and_accumulate(FastTimerService::AtomicResources&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginHLTriggerTimerPlugins.so
#8  0x000014c4850a7f7f in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x000014c4850967fb in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_ROOT6_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Current Modules:
Module: LogErrorEventFilter:logErrorTooManyClusters (crashed)
Module: AlcaBeamMonitor:AlcaBeamMonitor
Module: none
Module: L1TStage2RegionalMuonCandComp:l1tStage2OmtfOutVsuGMTIn

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_ROOT6_X_2024-03-15-2300/pyRelValMatrixLogs/run/140.006_RunDisplacedJet2022A/step3_RunDisplacedJet2022A.log#/

Also here the job seemed to process 0 events,

@makortel
Copy link
Contributor

Occurred in wf 141.009 step 3 in CMSSW_14_1_NONLTO_X_2024-03-15-2300

Thread 5 (Thread 0x148adc5ff700 (LWP 148528) "cmsRun"):
#3  0x0000148b307a6178 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  std::local_Rb_tree_rotate_left (__root=@0x148adb580378: 0x148a3dcaa980, __x=0x1489f8a81500) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6  std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x1489e725f840, __p=<optimized out>, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7  0x0000148ad1b6742b in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#8  0x0000148ad1b6bed2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#9  0x0000148b383f3615 in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x0000148b383e7180 in edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x0000148b382e0fb4 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x0000148b382e12b7 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x0000148b382e18e3 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 4 (Thread 0x148add55d700 (LWP 148527) "cmsRun"):
#2  0x0000148b307a1f80 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000148b35858e1f in write () from /lib64/libc.so.6
#5  0x0000148b357ca9dd in _IO_file_write@@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x0000148b357c9d4f in new_do_write () from /lib64/libc.so.6
#7  0x0000148b357cb10e in __GI__IO_file_xsputn () from /lib64/libc.so.6
#8  0x0000148b357c019c in fwrite () from /lib64/libc.so.6
#9  0x0000148b361f527d in std::basic_streambuf<char, std::char_traits<char> >::sputn (__n=5, __s=0x148b32d532c4 "\n%MSG", this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/streambuf:455
#10 std::__ostream_write<char, std::char_traits<char> > (__n=5, __s=0x148b32d532c4 "\n%MSG", __out=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:51
#11 std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x148b32d532c4 "\n%MSG", __n=5) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_2_0_pre2-el8_amd64_gcc12/build/CMSSW_13_2_0_pre2-build/BUILD/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/gcc-12.3.1/obj/x86_64-redhat-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:102
#12 0x0000148b32d33a86 in edm::service::ELoutput::log(edm::ErrorObj const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageService.so
#13 0x0000148b32d2f245 in edm::service::ELadministrator::log(edm::ErrorObj&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageService.so
#14 0x0000148b32d44cac in edm::service::ThreadSafeLogMessageLoggerScribe::log(edm::ErrorObj*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageService.so
#15 0x0000148b32d4c1d3 in edm::service::ThreadSafeLogMessageLoggerScribe::runCommand(edm::MessageLoggerQ::OpCode, void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageService.so
#16 0x0000148b385b22c8 in edm::MessageSender::ErrorObjDeleter::operator()(edm::ErrorObj*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageLogger.so
#17 0x0000148b385b5271 in std::_Sp_counted_deleter<edm::ErrorObj*, edm::MessageSender::ErrorObjDeleter, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageLogger.so
#18 0x0000148b385b3911 in edm::MessageSender::~MessageSender() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreMessageLogger.so
#19 0x0000148ae171d1da in BeamFitter::runFitterNoTxt() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libRecoVertexBeamSpotProducer.so
#20 0x0000148ae17202c0 in BeamFitter::runPVandTrkFitter() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libRecoVertexBeamSpotProducer.so
#21 0x0000148a9f3bf555 in AlcaBeamMonitor::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#22 0x0000148a9f3cd3d2 in virtual thunk to edm::one::impl::LuminosityBlockCacheHolder<edm::one::EDProducerBase, alcabeammonitor::BeamSpotInfo>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#23 0x0000148b38400139 in edm::one::EDProducerBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x0000148b383e61a0 in edm::WorkerT<edm::one::EDProducerBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x0000148b382e0fb4 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x0000148b382e12b7 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x0000148b382e168a in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#28 0x0000148b382e1b7f in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#29 0x0000148b382e1bf1 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 3 (Thread 0x148addf5e700 (LWP 148526) "cmsRun"):
#2  0x0000148b307a1f80 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x0000148b3577212b in sched_yield () from /lib64/libc.so.6
#5  0x0000148b369b7c83 in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/x86_64-redhat-linux-gnu/bits/gthr-default.h:693
#6  std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/std_thread.h:353
#7  tbb::detail::r1::stealing_loop_backoff::pause (this=0x148addf57ef8) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2bdfc55ee8e4ee32defbb06c66a16b5f/tbb-v2021.9.0/src/tbb/scheduler_common.h:266

Thread 1 (Thread 0x148b34c90680 (LWP 147149) "cmsRun"):
#2  0x0000148b307a1f80 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007ffe911dfbba in clock_gettime ()
#5  0x0000148b3582d58a in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x0000148b313f92c1 in boost::chrono::thread_clock::now() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/external/el8_amd64_gcc12/lib/libboost_chrono.so.1.80.0
#7  0x0000148b2fa58523 in FastTimerService::Measurement::measure_and_accumulate(FastTimerService::AtomicResources&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginHLTriggerTimerPlugins.so
#8  0x0000148b382e0fe3 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x0000148b382e12b7 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x0000148b382e18e3 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Current Modules:
Module: LogErrorEventFilter:logErrorTooManyClusters (crashed)
Module: none
Module: AlcaBeamMonitor:AlcaBeamMonitor
Module: L1TStage2MuonComp:l1tStage2uGMTMuonVsuGMTMuonCopy3

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_NONLTO_X_2024-03-15-2300/pyRelValMatrixLogs/run/141.009_RunCosmics2023B/step3_RunCosmics2023B.log#/

@makortel
Copy link
Contributor

Occurred in wf 140.008 step 3 in CMSSW_14_1_CLANG_X_2024-03-15-2300

Thread 4 (Thread 0x146f36b68700 (LWP 2294562) "cmsRun"):
#2  0x0000146f887e18f4 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007fffa2f57bba in clock_gettime ()
#5  0x0000146f8d88458a in clock_gettime@GLIBC_2.2.5 () from /lib64/libc.so.6
#6  0x0000146f885e92c1 in boost::chrono::thread_clock::now() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/external/el8_amd64_gcc12/lib/libboost_chrono.so.1.80.0
#7  0x0000146f87c21fcf in FastTimerService::preModuleGlobalEndLumi(edm::GlobalContext const&, edm::ModuleCallingContext const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginHLTriggerTimerPlugins.so
#8  0x0000146f902d2953 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x0000146f902d280a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x0000146f902d2621 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x0000146f902d12a5 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x0000146f902d0e7d in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 3 (Thread 0x146f37569700 (LWP 2294561) "cmsRun"):
#3  0x0000146f887e1c6f in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  std::local_Rb_tree_rotate_left (__root=@0x146f34cf2778: 0x146e7d5b9100, __x=0x146e3c663340) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6  std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x146e72093940, __p=<optimized out>, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7  0x0000146f2e3761a1 in std::_Rb_tree_iterator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<std::pair<unsigned int, unsigned int>&&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::piecewise_construct_t const&, std::tuple<std::pair<unsigned int, unsigned int>&&>&&, std::tuple<>&&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#8  0x0000146f2e371cc0 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#9  0x0000146f2e37485f in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#10 0x0000146f903f4441 in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x0000146f903ea59d in edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x0000146f902d2971 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x0000146f902d280a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x0000146f902d2621 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x0000146f902d12a5 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x0000146f902d0e7d in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

Current Modules:
Module: LogErrorEventFilter:logErrorTooManyClusters (crashed)
Module: L1TStage2RegionalMuonCandComp:l1tStage2BmtfOutVsuGMTIn
Module: none
Module: none

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_CLANG_X_2024-03-15-2300/pyRelValMatrixLogs/run/140.008_RunEGamma2022A/step3_RunEGamma2022A.log#/

@makortel
Copy link
Contributor

makortel commented Mar 17, 2024

Wf 140.004 step 3 in CMSSW_14_1_CLANG_X_2024-03-15-2300 shows likely related but different crash at job shutdown

#3  0x000014e957cf4c6f in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014e95e38b9a0 in edata_list_inactive_remove (item=0x14e903f42600, list=0x14e903e06038) at include/jemalloc/internal/edata.h:254
#6  je_eset_remove (eset=eset@entry=0x14e903e03a98, edata=edata@entry=0x14e903f42600) at src/eset.c:145
#7  0x000014e95e38d226 in je_ecache_evict (tsdn=tsdn@entry=0x14e95c1f0738, pac=pac@entry=0x14e903e039f0, ehooks=ehooks@entry=0x14e903e000c0, ecache=ecache@entry=0x14e903e03a28, npages_min=npages_min@entry=10526) at src/extent.c:182
#8  0x000014e95e39a4d2 in pac_stash_decayed (result=<synthetic pointer>, npages_decay_max=<optimized out>, npages_limit=<optimized out>, ecache=0x14e903e03a28, pac=0x14e903e039f0, tsdn=0x14e95c1f0738) at src/pac.c:351
#9  pac_decay_to_limit (tsdn=0x14e95c1f0738, pac=0x14e903e039f0, decay=0x14e903e11f08, decay_stats=<optimized out>, ecache=0x14e903e03a28, fully_decay=<optimized out>, npages_limit=<optimized out>, npages_decay_max=<optimized out>) at src/pac.c:449
#10 0x000014e95e39ad06 in pac_decay_try_purge (npages_limit=<optimized out>, current_npages=<optimized out>, ecache=0x14e903e03a28, decay_stats=0x14e903e010b8, decay=0x14e903e11f08, pac=0x14e903e039f0, tsdn=0x14e95c1f0738) at src/pac.c:474
#11 je_pac_maybe_decay_purge (eagerness=PAC_PURGE_ON_EPOCH_ADVANCE, ecache=0x14e903e03a28, decay_stats=0x14e903e010b8, decay=0x14e903e11f08, pac=0x14e903e039f0, tsdn=0x14e95c1f0738) at src/pac.c:512
#12 je_pac_maybe_decay_purge (tsdn=tsdn@entry=0x14e95c1f0738, pac=pac@entry=0x14e903e039f0, decay=decay@entry=0x14e903e11f08, decay_stats=decay_stats@entry=0x14e903e010b8, ecache=ecache@entry=0x14e903e03a28, eagerness=PAC_PURGE_ON_EPOCH_ADVANCE) at src/pac.c:481
#13 0x000014e95e33f980 in arena_decay_impl (all=false, is_background_thread=false, ecache=0x14e903e03a28, decay_stats=0x14e903e010b8, decay=0x14e903e11f08, arena=0x14e903e01040, tsdn=0x14e95c1f0738) at src/arena.c:439
#14 arena_decay_dirty (all=false, is_background_thread=false, arena=0x14e903e01040, tsdn=0x14e95c1f0738) at src/arena.c:459
#15 je_arena_decay (tsdn=tsdn@entry=0x14e95c1f0738, arena=arena@entry=0x14e903e01040, is_background_thread=is_background_thread@entry=false, all=all@entry=false) at src/arena.c:485
#16 0x000014e95e3a5f2f in arena_decay_ticks (nticks=<optimized out>, arena=0x14e903e01040, tsdn=0x16) at include/jemalloc/internal/arena_inlines_b.h:135
#17 tcache_bin_flush_impl (small=true, nflush=<optimized out>, ptrs=0x7fffe468fe20, binind=<optimized out>, cache_bin=<optimized out>, tcache=<optimized out>, tsd=0x16) at src/tcache.c:469
#18 tcache_bin_flush_bottom (small=true, rem=<optimized out>, binind=<optimized out>, cache_bin=<optimized out>, tcache=<optimized out>, tsd=0x16) at src/tcache.c:519
#19 je_tcache_bin_flush_small (tsd=tsd@entry=0x14e95c1f0738, tcache=<optimized out>, cache_bin=<optimized out>, binind=<optimized out>, rem=<optimized out>) at src/tcache.c:529
#20 0x000014e95e33c140 in tcache_dalloc_small (slow_path=<optimized out>, binind=<optimized out>, ptr=<optimized out>, tcache=<optimized out>, tsd=<optimized out>) at include/jemalloc/internal/tcache_inlines.h:157
#21 arena_sdalloc (slow_path=<optimized out>, caller_alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:418
#22 isdalloct (slow_path=<optimized out>, alloc_ctx=<optimized out>, tcache=<optimized out>, size=<optimized out>, ptr=<optimized out>, tsdn=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:133
#23 isfree (slow_path=<optimized out>, tcache=<optimized out>, usize=<optimized out>, ptr=<optimized out>, tsd=<optimized out>) at src/jemalloc.c:2988
#24 je_sdallocx_default (ptr=0x14e88041f500, size=<optimized out>, flags=<optimized out>) at src/jemalloc.c:3928
#25 0x000014e95e3aa7fe in sizedDeleteImpl (size=<optimized out>, ptr=<optimized out>) at src/jemalloc_cpp.cpp:195
#26 operator delete (ptr=<optimized out>, size=<optimized out>) at src/jemalloc_cpp.cpp:200
#27 0x000014e8fd865f7d in std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#28 0x000014e8fd865f6c in std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
...
#2321 0x000014e8fd865f6c in std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#2322 0x000014e8fd866612 in LogErrorEventFilter::~LogErrorEventFilter() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#2323 0x000014e8fd865d5d in virtual thunk to LogErrorEventFilter::~LogErrorEventFilter() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#2324 0x000014e95898a640 in std::_Sp_counted_deleter<edm::global::EDFilterBase*, std::default_delete<edm::global::EDFilterBase>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginIOMCRandomEnginePlugins.so
#2325 0x000014e95898a6dc in edm::maker::ModuleHolderT<edm::global::EDFilterBase>::~ModuleHolderT() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/pluginIOMCRandomEnginePlugins.so
#2326 0x000014e95f8ac53c in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, edm::propagate_const<std::shared_ptr<edm::maker::ModuleHolder> > > >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_CLANG_X_2024-03-15-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_CLANG_X_2024-03-15-2300/pyRelValMatrixLogs/run/140.004_RunBTagMu2022A/step3_RunBTagMu2022A.log#/

Probably caused by memory corruption caused by earlier data races, and jemalloc being picky.

@makortel
Copy link
Contributor

Occurred in

In CMSSW_14_1_NONLTO_X_2024-03-16-1100 wf 140.009 step 3 showed new kind of stack trace

Thread 5 (Thread 0x1461139ff700 (LWP 2069951) "cmsRun"):
#3  0x0000146167c67178 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014610bf73d4d in std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::pair<unsigned int, unsigned int> const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#6  0x000014610bf6f3f7 in LogErrorEventFilter::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#7  0x000014610bf73ed2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginDPGAnalysisSkims.so
#8  0x000014616f8b6615 in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x000014616f8aa180 in edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x000014616f7a3fb4 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#11 0x000014616f7a42b7 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x000014616f7a48e3 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 4 (Thread 0x14611496c700 (LWP 2069950) "cmsRun"):
#2  0x0000146167c62f80 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014616cfd282d in __lll_lock_wait () from /lib64/libpthread.so.0
#5  0x000014616cfcbba4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#6  0x000014616893fa1b in dqm::implementation::DQMStore::leaveLumi(unsigned int, unsigned int, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libDQMServicesCore.so
#7  0x000014616f8d01db in edm::stream::ProducingModuleAdaptorBase<edm::stream::EDProducerBase>::doStreamEndLuminosityBlock(edm::StreamID, edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#8  0x000014616f8aa440 in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDoStreamEnd(edm::StreamID, edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x000014616f795a39 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x000014616f795d13 in edm::Worker::doWorkNoPrefetchingAsync<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2> >(edm::WaitingTaskHolder, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::TransitionInfoType const&, edm::ServiceToken const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)2>::Context const*)::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

Thread 3 (Thread 0x14611536d700 (LWP 2069949) "cmsRun"):
#2  0x0000146167c62f80 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001461667fb6ae in reco::BeamSpot::BeamSpot() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libDataFormatsBeamSpot.so
#5  0x00001460e6bd31f6 in PVFitter::resetAll() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#6  0x00001460e6bca863 in AlcaBeamMonitor::globalEndLuminosityBlock(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#7  0x00001460e6bd83d2 in virtual thunk to edm::one::impl::LuminosityBlockCacheHolder<edm::one::EDProducerBase, alcabeammonitor::BeamSpotInfo>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/pluginAlcaBeamMonitor.so
#8  0x000014616f8c3139 in edm::one::EDProducerBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#9  0x000014616f8a91a0 in edm::WorkerT<edm::one::EDProducerBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so
#10 0x000014616f7a3fb4 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_NONLTO_X_2024-03-16-1100/lib/el8_amd64_gcc12/libFWCoreFramework.so

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_NONLTO_X_2024-03-16-1100/pyRelValMatrixLogs/run/140.009_RunTau2022A/step3_RunTau2022A.log#/

@makortel
Copy link
Contributor

The stack traces from CLANG and latest from NONLTO with std::_Rb_tree<std::pair<unsigned int, unsigned int>, ...> hint towards

{
auto guard = make_guard(statsGuard_);
statsPerLumi_[std::pair<uint32_t, uint32_t>(lumi.run(), lumi.luminosityBlock())] =
std::pair<size_t, size_t>(npass, nfail);
}

@makortel
Copy link
Contributor

makortel commented Mar 18, 2024

I ran valgrind on the step3 of 140.009, that showed

==24701== Thread 4:
==24701== Conditional jump or move depends on uninitialised value(s)
==24701==    at 0x6A5E3362: .LTHUNK17.lto_priv.0 (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so)
==24701==    by 0x6A5E39C1: virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so)
==24701==    by 0x4A87BFA: edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A7F26F: edm::WorkerT<edm::global::EDFilterBase>::implDoEnd(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x49D225E: std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3>::Context const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x49BF0CA: edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)3> >::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x49857DD: tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x63FE91A: UnknownInlinedFun (task_dispatcher.h:322)
==24701==    by 0x63FE91A: UnknownInlinedFun (task_dispatcher.h:458)
==24701==    by 0x63FE91A: UnknownInlinedFun (arena.cpp:137)
==24701==    by 0x63FE91A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==24701==    by 0x6400ACD: UnknownInlinedFun (private_server.cpp:271)
==24701==    by 0x6400ACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==24701==    by 0x68B4801: start_thread (in /usr/lib64/libc.so.6)
==24701==  Uninitialised value was created by a heap allocation
==24701==    at 0x4844ED1: operator new(unsigned long) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/external/valgrind/3.22.0-390bf50f6ee4c321d331c491bff126fd/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==24701==    by 0x6A5E83D0: edm::WorkerMaker<LogErrorEventFilter>::makeModule(edm::ParameterSet const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDPGAnalysisSkims.so)
==24701==    by 0x4A7B997: edm::Maker::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x49F3527: edm::Factory::makeModule(edm::MakeModuleParams const&, edm::ModuleTypeResolverMaker const*, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A021E7: edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A7C16F: edm::WorkerRegistry::getWorker(edm::WorkerParams const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A7C537: edm::WorkerManager::getWorker(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A57C81: edm::(anonymous namespace)::getWorker(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::ParameterSet&, edm::WorkerManager&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>) [clone .lto_priv.0] (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A61EB3: edm::StreamSchedule::fillWorkers(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::vector<edm::WorkerInPath, std::allocator<edm::WorkerInPath> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, edm::ConditionalTaskHelper const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A64755: edm::StreamSchedule::fillTrigPath(edm::ParameterSet&, edm::ProductRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<edm::HLTGlobalStatus>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, edm::ConditionalTaskHelper const&, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A5AA38: edm::StreamSchedule::StreamSchedule(std::shared_ptr<edm::TriggerResultInserter>, std::vector<edm::propagate_const<std::shared_ptr<edm::PathStatusInserter> >, std::allocator<edm::propagate_const<std::shared_ptr<edm::PathStatusInserter> > > >&, std::vector<edm::propagate_const<std::shared_ptr<edm::EndPathStatusInserter> >, std::allocator<edm::propagate_const<std::shared_ptr<edm::EndPathStatusInserter> > > >&, std::shared_ptr<edm::ModuleRegistry>, edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::PreallocationConfiguration const&, edm::ProductRegistry&, edm::ExceptionToActionTable const&, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration const>, edm::StreamID, edm::ProcessContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A413EE: edm::Schedule::Schedule(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::ProductRegistry&, edm::ExceptionToActionTable const&, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration const>, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::ModuleTypeResolverMaker const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x4A55730: edm::ScheduleItems::initModules(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::ModuleTypeResolverMaker const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x49D3A15: tbb::detail::d1::function_task<edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) [clone .lto_priv.0] (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so)
==24701==    by 0x63FE91A: UnknownInlinedFun (task_dispatcher.h:322)
==24701==    by 0x63FE91A: UnknownInlinedFun (task_dispatcher.h:458)
==24701==    by 0x63FE91A: UnknownInlinedFun (arena.cpp:137)
==24701==    by 0x63FE91A: tbb::detail::r1::market::process(rml::job&) (market.cpp:599)
==24701==    by 0x6400ACD: UnknownInlinedFun (private_server.cpp:271)
==24701==    by 0x6400ACD: tbb::detail::r1::rml::private_worker::thread_routine(void*) (private_server.cpp:221)
==24701==    by 0x68B4801: start_thread (in /usr/lib64/libc.so.6)

and I found that

mutable std::atomic<bool> statsGuard_;

is not initialized (in C++17, in C++20 atomic<T> would be value-initialized...). I opened #44447 to initialize statsGuard_. With the PR the valgrind warning above is gone.

It is still strange that none of the crashes showed concurrent activity in LogErrorEventFilter.

@dan131riley
Copy link

I got 140.0009 to crash in gdb with line numbers. The result is consistent, but still not terribly enlightening. Here are the only active threads, 11 is the one that crashed:

Thread 12 (Thread 0x7f68a49ff640 (LWP 3367569) "cmsRun"):
#0  0x00007f694c566f39 in ?? () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libz.so.1
#1  0x00007f694c5677b6 in deflate () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libz.so.1
#2  0x00007f694d040f45 in R__zipMultipleAlgorithm () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libCore.so
#3  0x00007f694da195dc in TBasket::WriteBuffer() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libTree.so
#4  0x00007f694da29178 in std::_Function_handler<void (), ROOT::Internal::TBranchIMTHelper::Run<TBranch::WriteBasketImpl(TBasket*, int, ROOT::Internal::TBranchIMTHelper*)::{lambda()#1}>(TBranch::WriteBasketImpl(TBasket*, int, ROOT::Internal::TBranchIMTHelper*)::{lambda()#1} const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libTree.so
#5  0x00007f694c1a75be in tbb::detail::d1::function_task<std::function<void ()> >::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libImt.so
#6  0x00007f694c5ce241 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f694a5b7a00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#7  tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f694a5b7a00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#8  tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#9  0x00007f694c1a7349 in ROOT::Experimental::TTaskGroup::Wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libImt.so
#10 0x00007f694daa2d86 in TTree::Fill() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/external/el9_amd64_gcc12/lib/libTree.so
#11 0x00007f6861665e48 in tbb::detail::d1::task_arena_function<(anonymous namespace)::TreeHelper<TH1F>::doFill(dqm::legacy::MonitorElement*)::{lambda()#1}, void>::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDQMServicesFwkIOPlugins.so
#12 0x00007f694c5bb24e in operator() (__closure=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:757
#13 tbb::detail::d0::try_call_proxy<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> >::on_completion<tbb::detail::r1::isolate_within_arena(tbb::detail::d1::delegate_base&, intptr_t)::<lambda()> > (on_completion_body=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#14 tbb::detail::r1::isolate_within_arena (d=warning: RTTI symbol not found for class 'tbb::detail::d1::task_arena_function<(anonymous namespace)::TreeHelper<TH1F>::doFill(dqm::legacy::MonitorElement*)::{lambda()#1}, void>'
..., isolation=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2021.9.0-31639470b6dca6dc015e76f14c6a5a7d/tbb-v2021.9.0/src/tbb/arena.cpp:758
#15 0x00007f686166e13c in (anonymous namespace)::TreeHelper<TH1F>::doFill(dqm::legacy::MonitorElement*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDQMServicesFwkIOPlugins.so
#16 0x00007f686166b2e3 in DQMRootOutputModule::writeLuminosityBlock(edm::LuminosityBlockForOutput const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/pluginDQMServicesFwkIOPlugins.so
#17 0x00007f694dffe03b in edm::core::OutputModuleCore::doWriteLuminosityBlock(edm::LuminosityBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#18 0x00007f694dffe174 in edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeLumiAsync(edm::WaitingTaskHolder, edm::LuminosityBlockPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f694dffe308 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeLumiAsync(edm::WaitingTaskHolder, edm::LuminosityBlockPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::OutputModuleCommunicatorT<edm::one::OutputModuleBase>::writeLumiAsync(edm::WaitingTaskHolder, edm::LuminosityBlockPrincipal const&, edm::ProcessContext const*, edm::ActivityRegistry*)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so

Thread 11 (Thread 0x7f68a5bfc640 (LWP 3367568) "cmsRun"):
#0  std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=__x@entry=0x7f67ad926240, __p=__p@entry=0x7f67ad07bfc0, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#1  0x00007f689fe73785 in std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_insert_node (__z=<optimized out>, __p=<optimized out>, __x=<optimized out>, this=<optimized out>) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_tree.h:2386
#2  std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_Auto_node::_M_insert (__p=..., this=<optimized out>) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_tree.h:1658
#3  std::_Rb_tree<std::pair<unsigned int, unsigned int>, std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> >, std::_Select1st<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<std::pair<unsigned int, unsigned int>&&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > >, std::piecewise_construct_t const&, std::tuple<std::pair<unsigned int, unsigned int>&&>&&, std::tuple<>&&) (__pos=..., this=0x7f68a1bd4968) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_tree.h:2466
#4  std::map<std::pair<unsigned int, unsigned int>, std::pair<unsigned long, unsigned long>, std::less<std::pair<unsigned int, unsigned int> >, std::allocator<std::pair<std::pair<unsigned int, unsigned int> const, std::pair<unsigned long, unsigned long> > > >::operator[] (__k=..., this=0x7f68a1bd4968) at /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02828/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_map.h:530
#5  LogErrorEventFilter::globalEndLuminosityBlock (this=0x7f68a1bd4800, lumi=..., iSetup=...) at src/DPGAnalysis/Skims/src/LogErrorEventFilter.cc:207
#6  0x00007f689fe739c2 in virtual thunk to edm::global::impl::LuminosityBlockCacheHolder<edm::global::EDFilterBase, leef::LumiErrors>::doEndLuminosityBlock_(edm::LuminosityBlock const&, edm::EventSetup const&) () at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/src/FWCore/Framework/interface/global/implementors.h:195
#7  0x00007f694e078bfb in edm::global::EDFilterBase::doEndLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el9_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-03-14-2300/lib/el9_amd64_gcc12/libFWCoreFramework.so

@wddgit
Copy link
Contributor

wddgit commented Mar 18, 2024

I believe this is an incorrectly coded spin lock. If b is false on the first try, it works as it should. But if it is not, then expected is set to true and the exchange succeeds on the second try even if b is still true.

    bool expected = false;
    while (not b.compare_exchange_strong(expected, true))
      ;

@dan131riley
Copy link

the really obvious question, which I decided not to ask earlier: why are they coding their own spinlock?

@makortel
Copy link
Contributor

the really obvious question, which I decided not to ask earlier: why are they coding their own spinlock?

It was done in #22329, so the question is more "why did we code our own spinlock there". There are also several other places where we have added ad-hock spinlock as similar loop over atomic.

@wddgit
Copy link
Contributor

wddgit commented Mar 18, 2024

Just randomly looking for a correct spin lock and I stumbled on this one first, I was going to quote this one as the correct way to do it, but looking closely I think it is also incorrect. It doesn't bust out of the loop when it succeeds. It waits until the next iteration.

https://cmssdt.cern.ch/lxr/source/FWCore/Framework/src/Path.cc#0136

@wddgit
Copy link
Contributor

wddgit commented Mar 18, 2024

@wddgit
Copy link
Contributor

wddgit commented Mar 18, 2024

Just skimming through looking for obvious spin locks with compare_exchange_strong, all the rest of them look OK. Just luck that the first other one I looked at was bad... There are not that many of them (I found 6 total including the 3 discussed above) that are obvious spin locks (I was grepping for compare_exchange_strong in a while loop without some more complicated logic involved).

I am actually not aware of a standard library implementation of a spin lock.

I think the design intent is that we only use them in cases where the lock is almost never taken and the time it is held is intentionally extremely short.

@dan131riley
Copy link

This one looks correct.

https://cmssdt.cern.ch/lxr/source/FWCore/Services/plugins/ConcurrentModuleTimer.cc#0206

Still wrong, atomic is not guaranteed to be lock-free. Better is to use std::atomic_flag, but even then, best is to just not do user-land spin locks. It’s just a bad idea. Unless you’re porting Doom to a CMSSW plugin, just trust the scheduler and take a mutex.

@dan131riley
Copy link

#44447 does not fix the crash. Previously, it looks like the compiler was actually optimizing out the guard as undefined. With #44447 statsGuard_ now has a value--so the PR did something--but it still crashes at the same spot.

@Dr15Jones
Copy link
Contributor

but it still crashes at the same spot.

I'm not completely surprised as the stack traces have never shown that the routine was being called concurrently. Maybe we need helgrind or valgrind?

@makortel
Copy link
Contributor

My valgrind showed only #44413 (comment) as relevant to this issue.

@wddgit
Copy link
Contributor

wddgit commented Mar 19, 2024

I could easily fix the two spin locks to work correctly. Ask and I'll do that. I suspect that will fix the problem... That is what I would suggest. We could fix it in some other way, either now or as a second PR later after we've thought about it more.

Some thoughts.

On the one hand, I see that avoiding spin locks eliminates the possibility for this kind of mistake. A spin lock is only about 5 lines of code, but it has to be just right and if it's wrong it might not be immediately obvious. The compiler is not going to find the issue and if it rarely locks, problems might be hard to notice, reproduce and debug. I've seen advice to not use spin locks ever. I certainly would not recommend them to most of our users.

On the other hand, the multi-threaded Framework is not built on simple things that are hard to get wrong. The Framework is fast partially because we avoid mutexes as much as possible and are using low level, non-locking, and difficult approaches.

The standard does not require atomics to be non-locking. They can be implemented uses mutexes. And also locks can be made out of atomics. But my understanding is that on the platforms we use, atomics are in fact non-locking. I think that is the whole point of why the Framework uses them and why the Framework is fast.

I see nothing wrong in ConcurrentModuleTimer.cc. Are you worried about the OS interrupting the spin lock thread while it is spinning? Even that would resolve itself with some delay almost always and would only occur very rarely. I suppose it is technically possible for such situations to lead to a deadlock, although I think that is probably rare enough to probably not happen in the lifetime of CMS. Also there is plenty of other multi-threading code in our Framework susceptible to the same kinds of problems that is much more complicated than a spin lock.

Maybe I need to read about atomic_flag. I haven't used one of those yet...

@wddgit
Copy link
Contributor

wddgit commented Mar 19, 2024

atomic_flag should also work.

https://en.cppreference.com/w/cpp/atomic/atomic_flag

If you don't use the C++20 extension, the interface looks simpler for a simple spin lock, maybe less prone to error. Guaranteed to be lock free is good, but I am not convinced the atomic<bool> alternative actually uses locks. I saw one post where someone looked at the assembly and said at least on his machine the assembly was identical. There seems to be some debate which is faster. I suspect in our 6 use cases, these are not significant performance-wise anyway. We would have to take the time to do benchmarks to really know.

Or we could go with mutex or something else...

@dan131riley
Copy link

I could easily fix the two spin locks to work correctly. Ask and I'll do that. I suspect that will fix the problem... That is what I would suggest.

I'm okay with the suggestion from @wddgit to fix the existing spinlocks and see if that fixes the crashes.

I believe it is true that on all our current platforms, atomics of primitive types are lock free. (There is some code in the stack trace signal handler that actually cares about this, and I can be a bit pedantic about that.)

I do believe that the difference between a std::mutex and a user-land spinlock is usually relatively immaterial compared to taking any kind of lock vs. avoiding blocking, and fine-grained vs. coarse-grained locking. We spend a lot of effort on minimizing the occurrence and scope of blocking locks, but I'm not convinced that spinlock vs. mutex makes much difference, except possibly in highly contested code paths.

@wddgit
Copy link
Contributor

wddgit commented Mar 21, 2024

@makortel I could probably implement and submit that tomorrow morning if you would like, just a minimal fix. We could follow that up with other changes later if we decide some other approach is better...

I searched for spin locks with atomic_flag and found CMSSW already has two of those also. Both looked OK.

I'm off on vacation tomorrow afternoon and all next week.

@makortel
Copy link
Contributor

@wddgit Please do the minimal fix. I think the other points raised by you and @dan131riley needs more discussion.

@wddgit
Copy link
Contributor

wddgit commented Mar 22, 2024

I just submitted #44517 which contains the minimal fixes.

@wddgit
Copy link
Contributor

wddgit commented Mar 22, 2024

One other comment here. The failure probably requires the job be configured for more than 1 concurrent LuminosityBlocks and that there is actually more than 1 LuminosityBlock in the job. I suspect the probability that multiple global end lumi transitions are executing at the same time is much higher if there aren't any events. One of the comments above mentions the job was processing 0 events.

@makortel
Copy link
Contributor

Looks like the failure rate decreased after the #44447 was merged in CMSSW_14_1_X_2024-03-22-2300. Since then the failures were

  • CMSSW_14_1_UBSAN_X_2024-03-22-2300 140.008, 140.011, 141.009
  • CMSSW_14_1_CLANG_X_2024-03-22-2300 140.006
  • CMSSW_14_1_ASAN_X_2024-03-22-2300 140.007
  • CMSSW_14_1_NONLTO_X_2024-03-24-0000 140.007
  • CMSSW_14_1_X_2024-03-24-2300 140.006
  • CMSSW_14_1_X_2024-03-24-2300 141.009
  • CMSSW_14_1_MULTIARCHS_X_2024-03-24-2300 140.009
  • CMSSW_14_1_CUDART_X_2024-03-24-2300 140.008

Earlier pretty much every IB had up to a few failures

@makortel
Copy link
Contributor

I believe a data race would still be a possible cause (especially given the remaining flaw fixed in #44517, and the observation in #44413 (comment)). The mechanism could be (e.g.) the data race itself occurring silently, but the tree data structure getting corrupted such that a subsequent insert() leads to a segfault.

@dan131riley
Copy link

I ran a 24-hour stress test with the fix from @wddgit and did not observe any failures, so I think it is correct that #44517 should fix the issue.

@makortel
Copy link
Contributor

I ran a 24-hour stress test with the fix from @wddgit and did not observe any failures, so I think it is correct that #44517 should fix the issue.

Thanks Dan!

@makortel
Copy link
Contributor

makortel commented Mar 27, 2024

Based on two IBs after the #44517 was merged it indeed seems the issue got fixed.

@makortel
Copy link
Contributor

Secondary question to @cms-sw/pdmv-l2

140.009, 140.021:

Are these two workflows intentionally not processing any Events? In a way it's great to reveal problems when LuminosityBlocks don't contain any Events, but nevertheless I wonder.

I'll follow up this in a separate issue

@makortel
Copy link
Contributor

+core

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants