Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for concurrent runs in the Framework #38801

Merged
merged 1 commit into from
Oct 4, 2022

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented Jul 20, 2022

PR description:

Implement support for concurrent runs in the Framework.

The design is analogous to what was done to implement
concurrent luminosity blocks as much as possible, although
there are unavoidable differences.

One can configure how many concurrent runs are allowed.
The default is one. With that setting, almost nothing externally
visible should change in the behavior of cmsRun. There
are significant and complex changes in the Framework
implementation to support this new ability.

Even with the number of concurrent runs configured to 1 the
Framework will be able to execute some transitions concurrently
which could not be executed concurrently before:

  • The streamBeginRun transition will be able to run concurrently with global begin lumi, and on other streams - stream begin lumi, events, stream end lumi, and stream end run.
  • The streamEndRun transition will be able to run concurrently with global end lumi and on other streams - stream begin lumi, events, stream end lumi and stream begin run.

If the number of concurrent runs is configured greater than one,
then global end run can run concurrently with any transitions from
another run and global begin run can run concurrently with any transitions
from another run except global begin run and global begin lumi.

This pull request does NOT upgrade modules and services outside
the Framework to support concurrent runs. We expect many of them
will fail if the number of concurrent runs is configured to be
more than one in an existing production configuration. We have not
surveyed existing code to see which modules and services cannot
support concurrent runs. Most should be OK because they do not
depend on run transitions. But for example, a module designed
to create per run histograms might have problems with concurrent
runs.

One configures the "numberOfConcurrentRuns" in the top level
options parameter set. If it is 0 or greater than the number
of concurrent lumis, then it will be reset to equal the
number of concurrent lumis.

If an EventSetup IOV changes at a run boundary, then one also
would need to configure concurrent IOVs for that record to two
to actually have the runs on both sides of that run boundary process
concurrently. Without that cmsRun would execute properly, but the
IOVs would block concurrent execution. In addition, it is technically
possible for the sequence of transitions beginLumi, endRun,
beginRun, and beginLumi to all have different IOVs. An EventSetup
record with such IOVs would need to be configured to allow
4 concurrent IOVs to process both runs concurrently across
such a run boundary.

It is the design intent that the rest of changes are transparent
to the user (beyond what is discussed above).

PR validation:

There are a few new unit tests. Existing unit tests pass. In fact
existing unit tests covered most of the features one might be
concerned about with this pull request. Of the new tests, these two
configurations are the most significant:

FWCore/Integration/test/testConcurrentIOVsAndRuns_cfg.py
FWCore/Integration/test/testConcurrentIOVsAndRunsRead_cfg.py

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38801/31156

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

  • FWCore/Framework (core)
  • FWCore/Integration (core)
  • FWCore/ParameterSet (core)
  • FWCore/TFWLiteSelector (core)
  • FWCore/TestProcessor (core)
  • IOPool/Input (core)
  • Mixing/Base (simulation)

@smuzaffar, @civanch, @Dr15Jones, @makortel, @mdhildreth, @cmsbuild can you please review it and eventually sign? Thanks.
@makortel, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented Jul 20, 2022

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-555ed7/26346/summary.html
COMMIT: bedd109
CMSSW: CMSSW_12_5_X_2022-07-20-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/38801/26346/install.sh to create a dev area with all the needed externals and cmssw changes.

Clang Build

I found compilation warning while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

See details on the summary page.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-38801/31160

@cmsbuild
Copy link
Contributor

Pull request #38801 was updated. @smuzaffar, @civanch, @Dr15Jones, @makortel, @mdhildreth, @cmsbuild can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented Jul 20, 2022

please test

@wddgit
Copy link
Contributor Author

wddgit commented Oct 5, 2022

At least at first glance, the other 34 relVal failures look unrelated to this PR.

@makortel
Copy link
Contributor

makortel commented Oct 5, 2022

Thanks David. I found that crash in two jobs so far (over all the flavors of two IBs)

el8_amd64_gcc10/CMSSW_12_6_CLANG_X_2022-10-04-2300 workflow 10808.0 step 6

Thread 5 (Thread 0x2b649d400700 (LWP 29705) "cmsRun"):
#3  0x00002b645e2af75f in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b64553c17b7 in edm::EventProcessor::readEvent(unsigned int) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#6  0x00002b64553c15ee in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#7  0x00002b64553dad1e in void edm::SerialTaskQueueChain::actionToRun<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::$_55&>(edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::$_55&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#8  0x00002b64553dabd1 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::$_55>(tbb::detail::d1::task_group&, edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::$_55&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so

Thread 1 (Thread 0x2b64589abc80 (LWP 29366) "cmsRun"):
#2  0x00002b645e2af3d0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b6457b8d08b in sched_yield () from /lib64/libc.so.6
#5  0x00002b6456ccf436 in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/x86_64-redhat-linux-gnu/bits/gthr-default.h:693
#6  std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/thread:379
#7  tbb::detail::d0::timed_spin_wait_until<tbb::detail::d1::waitable_atomic<int>::wait_until(int, unsigned long, std::memory_order)::{lambda()#1}>(tbb::detail::d1::waitable_atomic<int>::wait_until(int, unsigned long, std::memory_order)::{lambda()#1}) (condition=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/../../include/oneapi/tbb/detail/_utils.h:129
#8  tbb::detail::d1::waitable_atomic<int>::wait_until (order=std::memory_order_relaxed, context=18, expected=<optimized out>, this=0x2b645967cf18) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/../../include/oneapi/tbb/detail/_waitable_atomic.h:74
#9  tbb::detail::r1::market::adjust_demand (this=0x2b645967f580, a=..., delta=3, mandatory=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/market.cpp:584
#10 0x00002b6456cd3b86 in tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0> (this=0x2b645967cd80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/arena.h:547
#11 0x00002b645538c66d in edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#12 0x00002b64553bde2b in edm::EventProcessor::beginLumiAsync(edm::IOVSyncValue const&, std::shared_ptr<edm::RunProcessingStatus>, edm::WaitingTaskHolder) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#13 0x00002b64553bb539 in edm::EventProcessor::handleNextItemAfterMergingRunEntries(std::shared_ptr<edm::RunProcessingStatus>, edm::WaitingTaskHolder) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#14 0x00002b64553c8c4c in edm::FunctorWaitingTask<edm::waiting_task::detail::WaitingTaskChain<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#1}, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#7}> >, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#6}> >, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#5}>, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#4}>, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::$_4::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#3}> >::runLast(edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*)#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#15 0x00002b645538c6bb in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_CLANG_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so

(other threads were just waiting for work)
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_6_CLANG_X_2022-10-04-2300/pyRelValMatrixLogs/run/10808.0_SingleMuPt100+2018+SingleMuPt100_Eta2p85_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano/step6_SingleMuPt100+2018+SingleMuPt100_Eta2p85_GenSimINPUT+Digi+RecoFakeHLT+HARVESTFakeHLT+ALCA+Nano.log#/

el8_amd64_gcc10/CMSSW_12_6_X_2022-10-05-1100 workflow 139.005 step 3

Thread 1 (Thread 0x2b4c472b0800 (LWP 2389) "cmsRun"):
#3  0x00002b4c4cdd8d3b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el8_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b4c43ca6d9e in edm::EventProcessor::readEvent(unsigned int) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#6  0x00002b4c43caea8b in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#7  0x00002b4c43cb72f3 in edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so
#8  0x00002b4c43cb7738 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el8_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el8_amd64_gcc10/libFWCoreFramework.so

(the other threads are just waiting for work)
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_6_X_2022-10-05-1100/pyRelValMatrixLogs/run/139.005_AlCaPhiSym2021+AlCaPhiSym2021+RECOALCAECALPHISYMDR3+ALCAECALPHISYM/step3_AlCaPhiSym2021+AlCaPhiSym2021+RECOALCAECALPHISYMDR3+ALCAECALPHISYM.log#/

@makortel
Copy link
Contributor

makortel commented Oct 5, 2022

I was finally able to catch the (correct) exception in gdb, here is the stack trace (for relevant threads)

Thread 9 (Thread 0x7fffa2bff700 (LWP 16055) "cmsRun"):
#0  0x00007fffc43a6ab0 in edm::Exception::throwThis(edm::errors::ErrorCodes, char const*, char const*, char const*, char const*, char const*)@plt () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#1  0x00007fffc43cf151 in cond::persistency::getConnectionParams(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#2  0x00007fffc43b44f6 in cond::persistency::ConnectionPool::createCoralSession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#3  0x00007fffc43b47dd in cond::persistency::ConnectionPool::createSession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#4  0x00007fffc43b4b8d in cond::persistency::ConnectionPool::createReadOnlySession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#5  0x00007fffa9274165 in CondDBESSource::setIntervalFor(edm::eventsetup::EventSetupRecordKey const&, edm::IOVSyncValue const&, edm::ValidityInterval&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/plugin CondCoreESSourcesPlugins.so
#6  0x00007ffff7bfe101 in edm::EventSetupRecordIntervalFinder::findIntervalFor(edm::eventsetup::EventSetupRecordKey const&, edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#7  0x00007ffff7bfede8 in edm::eventsetup::EventSetupRecordProvider::setValidityIntervalFor(edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#8  0x00007ffff7be9b7f in edm::eventsetup::EventSetupProvider::setAllValidityIntervals(edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#9  0x00007ffff7c05e9f in edm::eventsetup::EventSetupsController::eventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder const&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007ffff7c06150 in edm::eventsetup::EventSetupsController::runOrQueueEventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&, edm::SerialTaskQueue&, edm::ActivityRegistry*, bool)::{lambda(edm::IOVSyncValue const&, edm::WaitingTaskHolder&)#1}::operator()(edm::IOVSyncValue const&, edm::WaitingTaskHolder&) const () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007ffff7c06455 in edm::SerialTaskQueue::QueuedTask<edm::eventsetup::EventSetupsController::runOrQueueEventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&, edm::SerialTaskQueue&, edm::ActivityRegistry*, bool)::{lambda()#2}>::execute() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#12 0x00007ffff7e27175 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so

Thread 1 (Thread 0x7ffff338b740 (LWP 15682) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff6318ba2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffffff2180) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffffff2180) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7fffffff2150) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:171
#4  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=<optimized out>, node=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:233
#5  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:229
#6  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:263
#7  tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=0x7fffffff2260, this=0x7fffffff2260, wakeup_condition=..., uniq_tag=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:118
#8  tbb::detail::r1::external_waiter::pause (this=0x7fffffff2260) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:144
#9  tbb::detail::r1::external_waiter::pause (this=0x7fffffff2260) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:137
#10 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:231
#11 0x00007ffff6319fa2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x7ffff1d47900) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:350
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff1d47900) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bb3fed in edm::FinalWaitingTask::wait() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so

@makortel
Copy link
Contributor

makortel commented Oct 5, 2022

The exception would be thrown from

edm::Service<edm::SiteLocalConfig> localconfservice;
if (!localconfservice.isAvailable()) {

(although this detail probably doesn't matter much towards fixing the problem)

@perrotta
Copy link
Contributor

perrotta commented Oct 5, 2022

I have prepared a revert pr just in case the issue reported in #38801 (comment) cannot get fixed quickly, and we want to cut 12_6_0_pre3
Hopefully it will get closed without being merged

@makortel
Copy link
Contributor

makortel commented Oct 5, 2022

Full stack trace of #38801 (comment) on the request of @dan131riley

05-Oct-2022 21:23:52 CEST  Successfully opened file file:RelVal_Raw_GRun_MC.root

Thread 12 (Thread 0x7fff901ff700 (LWP 16163) "cmsRun"):
#0  0x00007ffff5515a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff5b0ca7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007fffadbad15b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4  0x00007fffadbad9a8 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5  0x00007fffadbaa218 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6  0x00007fffaa9c3ed1 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7fff927ff700 (LWP 16162) "cmsRun"):
#0  0x00007ffff5515a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff5b0ca7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007fffadbad15b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4  0x00007fffadbad9a8 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5  0x00007fffadbaa218 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6  0x00007fffaa9c3ed1 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7fff934a8700 (LWP 16161) "cmsRun"):
#0  0x00007ffff5515a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff5b0ca7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_3_0_pre2-slc7_amd64_gcc10/build/CMSSW_12_3_0_pre2-build/BUILD/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/gcc-10.3.0/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007fffadbad15b in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WaitForWork(Eigen::EventCount::Waiter*, tensorflow::thread::EigenEnvironment::Task*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#4  0x00007fffadbad9a8 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#5  0x00007fffadbaa218 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_cc.so.2
#6  0x00007fffaa9c3ed1 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/external/slc7_amd64_gcc10/lib/libtensorflow_framework.so.2
#7  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7fffa2bff700 (LWP 16055) "cmsRun"):
#0  0x00007fffc43a6ab0 in edm::Exception::throwThis(edm::errors::ErrorCodes, char const*, char const*, char const*, char const*, char const*)@plt () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#1  0x00007fffc43cf151 in cond::persistency::getConnectionParams(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#2  0x00007fffc43b44f6 in cond::persistency::ConnectionPool::createCoralSession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#3  0x00007fffc43b47dd in cond::persistency::ConnectionPool::createSession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#4  0x00007fffc43b4b8d in cond::persistency::ConnectionPool::createReadOnlySession(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libCondCoreCondDB.so
#5  0x00007fffa9274165 in CondDBESSource::setIntervalFor(edm::eventsetup::EventSetupRecordKey const&, edm::IOVSyncValue const&, edm::ValidityInterval&) () from /cvmfs/cms-ib.cern.ch/week1/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginCondCoreESSourcesPlugins.so
#6  0x00007ffff7bfe101 in edm::EventSetupRecordIntervalFinder::findIntervalFor(edm::eventsetup::EventSetupRecordKey const&, edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#7  0x00007ffff7bfede8 in edm::eventsetup::EventSetupRecordProvider::setValidityIntervalFor(edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#8  0x00007ffff7be9b7f in edm::eventsetup::EventSetupProvider::setAllValidityIntervals(edm::IOVSyncValue const&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#9  0x00007ffff7c05e9f in edm::eventsetup::EventSetupsController::eventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder const&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#10 0x00007ffff7c06150 in edm::eventsetup::EventSetupsController::runOrQueueEventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&, edm::SerialTaskQueue&, edm::ActivityRegistry*, bool)::{lambda(edm::IOVSyncValue const&, edm::WaitingTaskHolder&)#1}::operator()(edm::IOVSyncValue const&, edm::WaitingTaskHolder&) const () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#11 0x00007ffff7c06455 in edm::SerialTaskQueue::QueuedTask<edm::eventsetup::EventSetupsController::runOrQueueEventSetupForInstanceAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder&, edm::WaitingTaskList&, std::vector<std::shared_ptr<edm::EventSetupImpl const>, std::allocator<std::shared_ptr<edm::EventSetupImpl const> > >&, edm::SerialTaskQueue&, edm::ActivityRegistry*, bool)::{lambda()#2}>::execute() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#12 0x00007ffff7e27175 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so
#13 0x00007ffff6306ffc in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7ffff1d52c00, waiter=..., this=0x7ffff1d47b00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:322
#14 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7ffff1d47b00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#15 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/arena.cpp:138
#16 0x00007ffff6313593 in tbb::detail::r1::market::process (j=..., this=0x7ffff1d57580) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/market.cpp:597
#17 tbb::detail::r1::rml::private_worker::run (this=0x7fffef5bff00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:267
#18 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5bff00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#19 0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fffa39ff700 (LWP 16054) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5bfe2c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5bfe2c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5bfe20) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5bfe00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5bfe00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7fffa49fe700 (LWP 16053) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5bfeac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5bfeac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5bfea0) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5bfe80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5bfe80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7fffa53ff700 (LWP 16052) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5c002c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5c002c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5c0020) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5c0000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5c0000) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7fffa63ff700 (LWP 16051) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5bffac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5bffac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5bffa0) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5bff80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5bff80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fffa72e8700 (LWP 16050) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5c012c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5c012c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5c0120) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5c0100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5c0100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7fffa7ce9700 (LWP 16049) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff63012c7 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffef5c00ac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffef5c00ac) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  0x00007ffff63138f1 in tbb::detail::r1::rml::internal::thread_monitor::commit_wait (c=..., this=0x7fffef5c00a0) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:243
#4  tbb::detail::r1::rml::private_worker::run (this=0x7fffef5c0080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:274
#5  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffef5c0080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#6  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fffc6a86700 (LWP 15939) "cmsRun"):
#0  0x00007ffff551875d in read () from /lib64/libpthread.so.0
#1  0x00007fffe914486f in full_read.constprop () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#2  0x00007fffe914518f in edm::service::InitRootHandlers::stacktraceHelperThread() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  0x00007ffff5b11f90 in std::execute_native_thread_routine (__p=0x7fffeb75e9b0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007ffff5511ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007ffff5239b0d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff338b740 (LWP 15682) "cmsRun"):
#0  0x00007ffff5233e29 in syscall () from /lib64/libc.so.6
#1  0x00007ffff6318ba2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffffff2180) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:103
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffffff2180) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:290
#3  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7fffffff2150) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:171
#4  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=<optimized out>, node=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:233
#5  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:229
#6  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:263
#7  tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=0x7fffffff2260, this=0x7fffffff2260, wakeup_condition=..., uniq_tag=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:118
#8  tbb::detail::r1::external_waiter::pause (this=0x7fffffff2260) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:144
#9  tbb::detail::r1::external_waiter::pause (this=0x7fffffff2260) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:137
#10 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:231
#11 0x00007ffff6319fa2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x7ffff1d47900) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:350
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7ffff1d47900) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#14 0x00007ffff7bb3fed in edm::FinalWaitingTask::wait() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#15 0x00007ffff7b9c0f0 in edm::EventProcessor::processRuns() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#16 0x00007ffff7ba8c81 in edm::EventProcessor::runToCompletion() () from /build/mkortela/debug/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#17 0x000000000040a266 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#18 0x00007ffff63080eb in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/arena.cpp:698
#19 0x000000000040b094 in main::{lambda()#1}::operator()() const ()
#20 0x000000000040971c in main ()

@makortel
Copy link
Contributor

makortel commented Oct 6, 2022

A segfault in el9_amd64_gcc11/CMSSW_12_6_X_2022-10-05-2300 workflow 136.851 step2

Begin processing the 1st record. Run 315489, Event 7098541, LumiSection 12 on stream 3 at 06-Oct-2022 09:43:57.598 CEST

Thread 4 (Thread 0x2ae481d22640 (LWP 21211) "cmsRun"):
#3  0x00002ae44528fddb in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002ae43efc141d in edm::EventProcessor::readEvent(unsigned int) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreFramework.so
#6  0x00002ae43efc786b in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreFramework.so
#7  0x00002ae43efd1a23 in edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreFramework.so
#8  0x00002ae43efd1e48 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreFramework.so
#9  0x00002ae43f56f125 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreConcurrency.so
#10 0x00002ae440ada51c in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x2ae44213bd00, waiter=..., this=0x2ae4421d9480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/task_dispatcher.h:322

Thread 1 (Thread 0x2ae4416cbe80 (LWP 20982) "cmsRun"):
#3  0x00002ae44528c600 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002ae440f23dfd in syscall () from /lib64/libc.so.6
#6  0x00002ae440ae0aaa in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fff58de6e50) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/semaphore.h:103
#7  tbb::detail::r1::binary_semaphore::P (this=0x7fff58de6e50) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/semaphore.h:290
#8  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7fff58de6e20) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:171
#9  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=<optimized out>, node=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:233
#10 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=0x2ae4421db598) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:229
#11 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., this=0x2ae4421db598) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/concurrent_monitor.h:263
#12 tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=0x7fff58de6f30, this=0x7fff58de6f30, wakeup_condition=..., uniq_tag=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/waiters.h:118
#13 tbb::detail::r1::external_waiter::pause (this=0x7fff58de6f30) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/waiters.h:144
#14 tbb::detail::r1::external_waiter::pause (this=0x7fff58de6f30) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/waiters.h:137
#15 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/task_dispatcher.h:231
#16 0x00002ae440ae1e90 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x2ae4421d9380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/task_dispatcher.h:350
#17 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x2ae4421d9380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#18 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc11/external/tbb/v2021.5.0-c0dbb6bd7407c1b3ad4cee87bb02cbc1/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#19 0x00002ae43efe48f7 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/nweek-02753/el9_amd64_gcc11/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/el9_amd64_gcc11/libFWCoreFramework.so

(other TBB threads are in tbb::detail::r1::futex_wait()
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el9_amd64_gcc11/CMSSW_12_6_X_2022-10-05-2300/pyRelValMatrixLogs/run/136.851_RunDoubleMuon2018A+RunDoubleMuon2018A+HLTDR2_2018+RECODR2_2018reHLT_Offline+HARVEST2018/step2_RunDoubleMuon2018A+RunDoubleMuon2018A+HLTDR2_2018+RECODR2_2018reHLT_Offline+HARVEST2018.log#/

Another one in slc7_amd64_gcc10/CMSSW_12_6_X_2022-10-05-2300 workflow 516.0 step 1

Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 06-Oct-2022 09:58:38.068 CEST

Thread 5 (Thread 0x2ab4b8200700 (LWP 17138) "cmsRun"):
#2  0x00002ab48dd39420 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ab487f6ee29 in syscall () from /lib64/libc.so.6
#5  0x00002ab487063dee in tbb::detail::r1::futex_wakeup_one (futex=0x2ab48cb2c12c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:113
#6  tbb::detail::r1::binary_semaphore::V (this=0x2ab48cb2c12c) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/semaphore.h:299
#7  tbb::detail::r1::rml::internal::thread_monitor::notify (this=0x2ab48cb2c120) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/rml_thread_monitor.h:226
#8  tbb::detail::r1::rml::private_worker::wake_or_launch (this=0x2ab48cb2c100) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:314
#9  tbb::detail::r1::rml::private_server::wake_some (this=<optimized out>, additional_slack=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:407
#10 0x00002ab4870644d3 in tbb::detail::r1::rml::private_server::adjust_job_count_estimate (delta=3, this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:418
#11 tbb::detail::r1::market::adjust_demand (this=0x2ab48adef580, a=..., delta=3, mandatory=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/market.cpp:586
#12 0x00002ab487068b86 in tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0> (this=0x2ab48adecd80) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/arena.h:547
#13 0x00002ab48573af65 in edm::EventProcessor::beginLumiAsync(edm::IOVSyncValue const&, std::shared_ptr<edm::RunProcessingStatus>, edm::WaitingTaskHolder) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#14 0x00002ab48573f560 in edm::EventProcessor::handleNextItemAfterMergingRunEntries(std::shared_ptr<edm::RunProcessingStatus>, edm::WaitingTaskHolder) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#15 0x00002ab4857405dc in edm::FunctorWaitingTask<edm::waiting_task::detail::WaitingTaskChain<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#8}, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#7}> >, edm::waiting_task::detail::Conditional<edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#6}> >, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#5}>, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#4}>, edm::waiting_task::detail::AutoExceptionHandler<edm::EventProcessor::beginRunAsync(edm::IOVSyncValue const&, edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*, auto:1)#2}::operator()<edm::WaitingTaskHolder>(std::__exception_ptr::exception_ptr const*, edm::WaitingTaskHolder) const::{lambda(edm::LimitedTaskQueue::Resumer)#1}::operator()(edm::LimitedTaskQueue::Resumer)::{lambda()#1}::operator()()::{lambda(auto:1)#3}> >::runLast(edm::WaitingTaskHolder)::{lambda(std::__exception_ptr::exception_ptr const*)#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#16 0x00002ab48571114f in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so

Thread 4 (Thread 0x2ab4b73d3700 (LWP 17137) "cmsRun"):
#3  0x00002ab48dd3cd4b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002ab485734dae in edm::EventProcessor::readEvent(unsigned int) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#6  0x00002ab48573ca9b in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#7  0x00002ab485745303 in edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}::operator()() () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#8  0x00002ab485745748 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}>(tbb::detail::d1::task_group&, edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::{lambda()#1}&&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so
#9  0x00002ab485579175 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreConcurrency.so

Thread 3 (Thread 0x2ab4b69d2700 (LWP 17136) "cmsRun"):
#2  0x00002ab48dd39420 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ab487f59917 in sched_yield () from /lib64/libc.so.6
#5  0x00002ab487057880 in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/x86_64-unknown-linux-gnu/bits/gthr-default.h:693
#6  std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/thread:379
#7  tbb::detail::r1::stealing_loop_backoff::pause (this=0x2ab4b69cbe58) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/scheduler_common.h:266
#8  tbb::detail::r1::waiter_base::pause (this=0x2ab4b69cbe50) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:35
#9  tbb::detail::r1::outermost_worker_waiter::pause (this=0x2ab4b69cbe50) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:69
#10 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::outermost_worker_waiter> (this=0x2ab48aded480, tls=..., ed=..., waiter=..., isolation=0, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:231
#11 0x00002ab4870583c2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x2ab48aded480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:350
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x2ab48aded480) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#13 tbb::detail::r1::arena::process (this=<optimized out>, tls=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/arena.cpp:138
#14 0x00002ab487064593 in tbb::detail::r1::market::process (j=..., this=0x2ab48adef580) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/market.cpp:597
#15 tbb::detail::r1::rml::private_worker::run (this=0x2ab48cb2c080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:267
#16 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x2ab48cb2c080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/private_server.cpp:221
#17 0x00002ab487c60ea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00002ab487f74b0d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2ab489ff75c0 (LWP 12349) "cmsRun"):
#2  0x00002ab48dd39420 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002ab487f59917 in sched_yield () from /lib64/libc.so.6
#5  0x00002ab48706972c in __gthread_yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/x86_64-unknown-linux-gnu/bits/gthr-default.h:693
#6  std::this_thread::yield () at /data/cmsbld/jenkins/workspace/build-any-ib/w/slc7_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/thread:379
#7  tbb::detail::r1::stealing_loop_backoff::pause (this=0x7ffc3ad0e828) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/scheduler_common.h:266
#8  tbb::detail::r1::waiter_base::pause (this=0x7ffc3ad0e820) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:35
#9  tbb::detail::r1::external_waiter::pause (this=0x7ffc3ad0e820) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/waiters.h:138
#10 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:231
#11 0x00002ab48706afa2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x2ab48aded380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:350
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x2ab48aded380) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.h:463
#13 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc10/external/tbb/v2021.5.0-36aff7df349e0716374b1668ccd18e17/tbb-v2021.5.0/src/tbb/task_dispatcher.cpp:168
#14 0x00002ab485757fed in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/nweek-02753/slc7_amd64_gcc10/cms/cmssw/CMSSW_12_6_X_2022-10-04-2300/lib/slc7_amd64_gcc10/libFWCoreFramework.so

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc10/CMSSW_12_6_X_2022-10-05-2300/pyRelValMatrixLogs/run/516.0_WTolNuJets_LO_Mad_13TeV_py8_taupinu+WTolNu01234Jets_5f_LO_MLM_Madgraph_LHE_13TeV+Hadronizer_TuneCP5_13TeV_MLM_5f_max4j_LHE_pythia8_taupinu+HARVESTGEN2/step1_WTolNuJets_LO_Mad_13TeV_py8_taupinu+WTolNu01234Jets_5f_LO_MLM_Madgraph_LHE_13TeV+Hadronizer_TuneCP5_13TeV_MLM_5f_max4j_LHE_pythia8_taupinu+HARVESTGEN2.log#/

@dan131riley
Copy link

From UBSAN:

https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/el8_amd64_gcc11/CMSSW_12_6_UBSAN_X_2022-10-05-1100/pyRelValMatrixLogs/run/250399.0_FS_PREMIXUP15_PU25+FS_PREMIXUP15_PU25/step1_FS_PREMIXUP15_PU25+FS_PREMIXUP15_PU25.log

/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2274:56: runtime error: member call on null pointer of type 'struct element_type'
    #0 0x2abb783a8372 in edm::EventProcessor::readEvent(unsigned int) /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2274
    #1 0x2abb784233f5 in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2201
    #2 0x2abb7845df34 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2212
    #3 0x2abb78460eb7 in actionToRun<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:113
    #4 0x2abb78460eb7 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:78
    #5 0x2abb78460eb7 in execute /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueue.h:175
    #6 0x2abb76a74344 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/src/SerialTaskQueue.cc:48
    #7 0x2abb76a74344 in task_ptr_or_nullptr_impl<const edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/tbb/v2021.5.0-26deaf86b02cf9ce10d1fb9d6400c40a/include/oneapi/tbb/task_group.h:118

followed by

/data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/RunProcessingStatus.h:83:19: runtime error: member access within null pointer of type 'struct RunProcessingStatus'
    #0 0x2abb783a803a in edm::RunProcessingStatus::updateLastTimestamp(edm::Timestamp const&) /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/RunProcessingStatus.h:83
    #1 0x2abb783a803a in edm::EventProcessor::readEvent(unsigned int) /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2274
    #2 0x2abb784233f5 in edm::EventProcessor::readNextEventForStream(edm::WaitingTaskHolder const&, unsigned int, edm::LuminosityBlockProcessingStatus&) /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2201
    #3 0x2abb7845df34 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2212
    #4 0x2abb78460eb7 in actionToRun<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:113
    #5 0x2abb78460eb7 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:78
    #6 0x2abb78460eb7 in execute /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueue.h:175
    #7 0x2abb76a74344 in operator() /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/src/SerialTaskQueue.cc:48
    #8 0x2abb76a74344 in task_ptr_or_nullptr_impl<const edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::<lambda()>&> /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc11/external/tbb/v2021.5.0-26deaf86b02cf9ce10d1fb9d6400c40a/include/oneapi/tbb/task_group.h:118

and finally

#3  0x00002abb88e3179e in (anonymous namespace)::sig_dostack_then_abort (sig=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Services/plugins/InitRootHandlers.cc:543
#4  <signal handler called>
#5  0x00002abb783a754b in edm::Timestamp::operator> (iRHS=..., this=0x2abbd46a59c8) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/DataFormats/Provenance/interface/Timestamp.h:62
#6  edm::RunProcessingStatus::updateLastTimestamp (iTime=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/RunProcessingStatus.h:83
#7  edm::EventProcessor::readEvent (this=0x2abb83952800, iStreamIndex=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2274
#8  0x00002abb784233f6 in edm::EventProcessor::readNextEventForStream (this=0x2abb83952800, iTask=..., iStreamIndex=0, iStatus=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2201
#9  0x00002abb7845df35 in operator() (__closure=0x2abc7bea2998) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Framework/src/EventProcessor.cc:2212
#10 0x00002abb78460eb8 in edm::SerialTaskQueueChain::actionToRun<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::<lambda()>&> (iAction=..., this=0x2abb83952be8) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:113
#11 operator() (__closure=0x2abc7bea2990) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueueChain.h:78
#12 edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::<lambda()> >(tbb::detail::d1::task_group&, edm::EventProcessor::handleNextEventForStreamAsync(edm::WaitingTaskHolder, unsigned int)::<lambda()>&&)::<lambda()> >::execute(void) (this=0x2abc7bea2980) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/interface/SerialTaskQueue.h:175
#13 0x00002abb76a74345 in operator() (__closure=0x2abb82937540) at /data/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/1501425fe95e73b124adecdddd969b97/opt/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_12_6_UBSAN_X_2022-10-05-1100/src/FWCore/Concurrency/src/SerialTaskQueue.cc:48

@makortel
Copy link
Contributor

makortel commented Oct 6, 2022

Thanks Dan! The crash occurs on line

streamRunStatus_[iStreamIndex]->updateLastTimestamp(input_->timestamp());

The streamRunStatus_ is std::vector<std::shared_ptr<RunProcessingStatus>>, so the second message would imply that the streamRunStatus_[iStreamIndex] would be nullptr?

@makortel
Copy link
Contributor

makortel commented Oct 6, 2022

I see the streamRunStatus_ elements are set only in EventProcessor::streamBeginRunAsync()

streamRunStatus_[iStream] = std::move(status);

So somehow streamBeginRunAsync() and handleNextEventForStreamAsync() (that ends up calling readEvent()) can end up being called in a wrong order?

@Dr15Jones
Copy link
Contributor

I think the problem is because beginLumiAsync (which calls handleNextEventForStreamAsync) is started at the same time as streamBeginRunAsync and there is a race to see which puts values into streamQueuesInserter_.

@makortel
Copy link
Contributor

makortel commented Oct 6, 2022

Should streamRunStatus_ element be set before this call?

// Call this before inserting into the stream queues so that stream begin run
// is executed before global begin lumi in a single threaded job. This is not
// required or necessary, but it is desirable to preserve the pre-concurrent run
// behavior. In a multi-threaded job these things might run concurrently.
handleNextItemAfterMergingRunEntries(status, holder);

(handleNextItemAfterMergingRunEntries() can call beginLumiAsync(), and streamBeginRunAsync() is queued few lines below)

@wddgit
Copy link
Contributor Author

wddgit commented Oct 6, 2022

I think maybe Chris is correct (not 100% sure, but probably). My first thought is to add a chain with the loop adding things streamQueuesInserter_ in the first link of the chain and the call to handleNextItemAfterMergingRunEntries in the second link of the chain. The problem with that is in the single threaded case streamBeginRun does not run before globalBeginLumi. That is not actually incorrect by the multi-threading design but it is a change in behavior and at the least will break some single threaded unit tests (which are fixable...). Still thinking.

@wddgit
Copy link
Contributor Author

wddgit commented Oct 6, 2022

@Dr15Jones @makortel @dan131riley

Below is a proposed fix. What do you think? Core tests pass.

Changing the order so that handleNextItemAfterMergingRunEntries is
called after the insertion into streamQueuesInserter_ guarantees that
the order in the stream queues is correct. The unpause of the
streamQueuesInserter_ is what causes stream begin run to execute before
global begin lumi in the single threaded case.

Mildly annoying that we have to pause here, but it much better than it was
before we added streamQueuesInserter_ because there is only 1 pause instead
of 1 pause per stream.

+namespace {
+  class PauseQueueSentry {
+  public:
+    PauseQueueSentry(edm::SerialTaskQueue& queue) : queue_(queue) { queue_.pause(); }
+    ~PauseQueueSentry() { queue_.resume(); }
+  private:
+    edm::SerialTaskQueue& queue_;
+  };
+}
+
 namespace edm {
 
   namespace chain = waiting_task::chain;
@@ -1313,12 +1323,7 @@ namespace edm {
 
                           ServiceRegistry::Operate operate(serviceToken_);
 
-                          // Call this before inserting into the stream queues so that stream begin run
-                          // is executed before global begin lumi in a single threaded job. This is not
-                          // required or necessary, but it is desirable to preserve the pre-concurrent run
-                          // behavior. In a multi-threaded job these things might run concurrently.
-                          handleNextItemAfterMergingRunEntries(status, holder);
-
+                          PauseQueueSentry pauseQueueSentry(streamQueuesInserter_);
                           CMS_SA_ALLOW try {
                             streamQueuesInserter_.push(
                                 *holder.group(), [this, status, precedingTasksSucceeded, holder]() mutable {
@@ -1348,6 +1353,7 @@ namespace edm {
                             queueWhichWaitsForIOVsToFinish_.resume();
                             exceptionRunStatus_ = status;
                           }
+                          handleNextItemAfterMergingRunEntries(status, holder);
                         }) | runLast(postSourceTask);

@makortel
Copy link
Contributor

makortel commented Oct 7, 2022

Thanks David, I think your proposal makes sense (and I can't think of how to do it better). I have one question though. Should handleNextItemAfterMergingRunEntries() be called here in any case, or only if the code in the preceding try-block did not throw?

@wddgit
Copy link
Contributor Author

wddgit commented Oct 7, 2022

If there is an exception, we want handleNextItemAfterMergingRunEntries() to be called. That function will notice an exception was thrown and saved in the WaitingTaskHolder. In that case, it will not call beginRunAsync, instead it will call endRunAsync that will try to cleanly shutdown everything, calling stream and global end run and eventually closing the file... This morning I'll generate the new PR will the commit from #38801 and the two bug fixes.

@wddgit
Copy link
Contributor Author

wddgit commented Oct 7, 2022

Actually, I take it back, it will only call global end run because stream end run was never called.

@wddgit
Copy link
Contributor Author

wddgit commented Oct 7, 2022

Sorry for the noise, I meant because stream begin run was never called.

@makortel
Copy link
Contributor

makortel commented Oct 7, 2022

Thanks for the explanation.

This morning I'll generate the new PR will the commit from #38801 and the two bug fixes.

Could you organize the PR such that the commit of #38801 stays as it is, and the fixes are added in one or two commits? (makes it easier to cross check)

@Dr15Jones
Copy link
Contributor

I'm still reviewing David's proposal for the fix. However, I think we might have another difficulty. It seems to me that if we had a failure while doing streamBeginRun that streamBeginLuminosityBlock might still be run?

@wddgit
Copy link
Contributor Author

wddgit commented Oct 7, 2022

@Dr15Jones It is an interesting question.

If there is an exception in streamBeginRun on one stream, then on other streams streamBeginLumi might or might not already have run and completed. I think that is the desired behavior. Events might or might not have already been processed in that lumi. I think that is the desired behavior.

The interesting question is whether we want streamBeginLumi to check for an exception and bail out. Or do we want consistency and we run all streamBeginLumi's since some of them might have run already. Just reading the code I think that you are right that currently it will run all the streamBeginLumi's. It will not stop until it hits an event after the streamBeginRun exception occurs (which might not be the first event in the lumi, other streams might even finish all the events).

I will run some tests to double check the behavior is what I think it is. I think I did that already, but I will try them again.

@wddgit
Copy link
Contributor Author

wddgit commented Oct 7, 2022

@Dr15Jones The modified PR is ready. Should I submit it and we move the discussion there or would you rather I make some modifications to the fix before I submit the new PR?

@Dr15Jones
Copy link
Contributor

@wddgit
I think make the new PR and do any discussions there.

@makortel
Copy link
Contributor

makortel commented Oct 7, 2022

I think make the new PR and do any discussions there.

I concur

@wddgit wddgit deleted the implementConcurrentRuns11 branch October 28, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants