Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of Validation/Geometry unit test in CMSSW_11_0_ROOT6_X IBs #136

Closed
Dr15Jones opened this issue Oct 9, 2019 · 28 comments
Closed

Comments

@Dr15Jones
Copy link

The unit test materialBudgetTrackerPlots in Validation/Geometry has been failing. I've tracked the problem down to doing import ROOT in the cmsRun configuration file. This even fails just by doing the same import from within python itself. The import causes an assert with the message

[cdj@cmslpc122 src]$ python -i
Python 2.7.15+ (default, Apr 19 2019, 15:49:47) 
[GCC 8.3.1 20190225] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
error: file '/usr/include/linux/falloc.h' from the precompiled header has been overridden
python: /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01-bealjp/root-6.19.01/interpreter/llvm/src/tools/clang/include/clang/Serialization/Module.h:72: clang::serialization::InputFile::InputFile(const clang::FileEntry*, bool, bool): Assertion `!(isOverridden && isOutOfDate) && "an overridden cannot be out-of-date"' failed.
Abort
@cmsbuild
Copy link

cmsbuild commented Oct 9, 2019

A new Issue was created by @Dr15Jones Chris Jones.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@smuzaffar
Copy link

Thanks @Dr15Jones , I have opened a JIRA issue https://sft.its.cern.ch/jira/browse/ROOT-10353

@smuzaffar
Copy link

@Dr15Jones , import ROOT issue is fixed now but the unit test still fails and gdb shows [a]. Note that edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> > is defined in

DataFormats/SiPixelDetId/src/classes_def.xml:
DataFormats/SiPixelDetId/src/classes_def.xml:

0x00007fc870ae79dc in TList::FindObject (this=0x7fc85afd1060, name=0x7fc8574a32dd "edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> >") at /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01/root-6.19.01/core/cont/src/TList.cxx:586
(gdb) where
#0  0x00007fc870ae79dc in TList::FindObject (this=0x7fc85afd1060, name=0x7fc8574a32dd "edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> >") at /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01/root-6.19.01/core/cont/src/TList.cxx:586
#1  0x00007fc870ae3c8a in THashTable::FindObject (this=0x7fc86ce7e630, name=0x7fc8574a32dd "edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> >") at /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01/root-6.19.01/core/cont/src/THashTable.cxx:244
#2  0x00007fc870ad4ce8 in ROOT::RemoveClass (cname=0x7fc8574a32dd "edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> >") at /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01/root-6.19.01/core/cont/src/TClassTable.cxx:849
#3  0x00007fc871721e0e in ROOT::Internal::TDefaultInitBehavior::Unregister (this=0x7fc870d5c290 <ROOT::Internal::DefineBehavior(void*, void*)::theDefault>, classname=0x7fc8574a32dd "edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> >") at include/Rtypes.h:174
#4  0x00007fc870b54484 in ROOT::TGenericClassInfo::~TGenericClassInfo (this=0x7fc8574ae0c0 <ROOT::GenerateInitInstanceLocal(edm::Wrapper<edmNew::DetSetVector<PixelFEDChannel> > const*)::instance>, __in_chrg=<optimized out>) at /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc820/lcg/root/6.19.01/root-6.19.01/core/meta/src/TGenericClassInfo.cxx:221
#5  0x00007fc86ed42c99 in __run_exit_handlers () from /lib64/libc.so.6
#6  0x00007fc86ed42ce7 in exit () from /lib64/libc.so.6
#7  0x00007fc86ed2b50c in __libc_start_main () from /lib64/libc.so.6
#8  0x000000000040f3e8 in _start ()

@smuzaffar
Copy link

@vgvassilev , @oshadura any idea about this. With latest ROOT master although the python ROOT is fixed but this unit test is still failing.

@vgvassilev
Copy link

Looks like a teardown issue. Is this the full backtrace? ping @pcanal.

@smuzaffar
Copy link

yes this is full backtrace

@pcanal
Copy link

pcanal commented Oct 15, 2019

Which ROOT commit is that with?

@smuzaffar
Copy link

these are the commits when we noticed this failure root-project/root@7256943...5c1a587

@pcanal
Copy link

pcanal commented Oct 15, 2019

Well that's not good. Some of those were supposed to 'fix' this kind of problem. So I will need a reproducer :(

@smuzaffar
Copy link

on any of cmsdev machines ( e.g. cmsdev21 ) do this. Note that root in this release is already build in debug mode. You might want to run it under gdb to see the stacktrace

scram p CMSSW_11_0_ROOT6_X_2019-10-14-2300
cd CMSSW_11_0_ROOT6_X_2019-10-14-2300/
cmsenv
cmsRun $CMSSW_RELEASE_BASE/src/Validation/Geometry/test/runP_Tracker_cfg.py geom=Extended2015 label=Tracker

@pcanal
Copy link

pcanal commented Oct 17, 2019

The crash is caused by an exception thrown during a TClass construction (and the TClass constructor not being exception safe):

#0  0x00007f4ba225ad1d in __cxxabiv1::__cxa_throw (obj=0x7f4b9dfb8680, tinfo=0x7f4ba46171a8 <typeinfo for edm::Exception>, dest=0x7f4ba45dc990 <edm::Exception::~Exception()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:78
#1  0x00007f4b8ca9ae09 in (anonymous namespace)::RootErrorHandlerImpl(int, char const*, char const*) [clone .cold.382] ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x00007f4ba36035b1 in ErrorHandler(Int_t, const char *, const char *, typedef __va_list_tag __va_list_tag *) (level=3000, location=0x7f4ba3840256 "TClass::LoadClassInfo", 
    fmt=0x7f4ba383f358 "no interpreter information for class %s is available even though it has a TClass initialization routine.", ap=0x7ffe0bdddad8) at /afs/cern.ch/user/p/pcanal/root_working/master/core/base/src/TError.cxx:249
#3  0x00007f4ba360376a in Error (location=0x7f4ba3840256 "TClass::LoadClassInfo", fmt=0x7f4ba383f358 "no interpreter information for class %s is available even though it has a TClass initialization routine.")
    at /afs/cern.ch/user/p/pcanal/root_working/master/core/base/src/TError.cxx:292
#4  0x00007f4ba36ec11c in TClass::LoadClassInfo (this=0x7f4b8eb94a00) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:5642
#5  0x00007f4ba36e8b01 in TClass::GetClassMethodWithPrototype (this=0x7f4b8eb94a00, name=0x7f4ba3840618 "Streamer", proto=0x7f4ba384060f "TBuffer&", objectIsConst=false, mode=ROOT::kConversionMatch)
    at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:4417
#6  0x00007f4ba36ecc4e in TClass::Property (this=0x7f4b8eb94a00) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:5875
#7  0x00007f4ba36e6065 in TClass::GetListOfDataMembers (this=0x7f4b8eb94a00, load=false) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:3667
#8  0x00007f4ba371bae5 in TProtoClass::FindDataMember (cl=0x7f4b8eb94a00, index=0) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TProtoClass.cxx:533
#9  0x00007f4ba371b180 in TProtoClass::TProtoRealData::CreateRealData (this=0x7ffe0bdddf40, dmClass=0x7f4b8eb94a00, parent=0x7f4b8eb94600, prevData=0x7f4b8ac95e10, prevLevel=4) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TProtoClass.cxx:445
#10 0x00007f4ba371ab14 in TProtoClass::FillTClass (this=0x7f4b8dda4780, cl=0x7f4b8eb94600) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TProtoClass.cxx:356
#11 0x00007f4ba36de30c in TClass::Init (this=0x7f4b8eb94600, name=0x7f4b8c9d907a "TStorageFactoryFile", cversion=0, typeinfo=0x7f4b8c9e0790 <typeinfo for TStorageFactoryFile>, isa=0x7f4b8e2cb9c0, dfil=0x7f4b8c9d9046 "IOPool/TFileAdaptor/interface/TStorageFactoryFile.h", 
    ifil=0x7f4b8c9da1b0 "/build/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/edf168ccf7298d8c5f32702676caa57a/opt/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/src/IOPool/TFileAdaptor/src/TStor"..., dl=15, il=72, givenInfo=0x0, 
    silent=false) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:1446
#12 0x00007f4ba36dd8ad in TClass::TClass (this=0x7f4b8eb94600, name=0x7f4b8c9d907a "TStorageFactoryFile", cversion=0, info=..., isa=0x7f4b8e2cb9c0, dfil=0x7f4b8c9d9046 "IOPool/TFileAdaptor/interface/TStorageFactoryFile.h", 
    ifil=0x7f4b8c9da1b0 "/build/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/edf168ccf7298d8c5f32702676caa57a/opt/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/src/IOPool/TFileAdaptor/src/TStor"..., dl=15, il=72, silent=false)
    at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:1277
#13 0x00007f4ba36ec23a in ROOT::CreateClass (cname=0x7f4b8c9d907a "TStorageFactoryFile", id=0, info=..., isa=0x7f4b8e2cb9c0, dfil=0x7f4b8c9d9046 "IOPool/TFileAdaptor/interface/TStorageFactoryFile.h", 
    ifil=0x7f4b8c9da1b0 "/build/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/edf168ccf7298d8c5f32702676caa57a/opt/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/src/IOPool/TFileAdaptor/src/TStor"..., dl=15, il=72)
    at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:5672
#14 0x00007f4ba4317e5c in ROOT::Internal::TDefaultInitBehavior::CreateClass (this=0x7f4ba3952330 <ROOT::Internal::DefineBehavior(void*, void*)::theDefault>, cname=0x7f4b8c9d907a "TStorageFactoryFile", id=0, info=..., isa=0x7f4b8e2cb9c0, 
    dfil=0x7f4b8c9d9046 "IOPool/TFileAdaptor/interface/TStorageFactoryFile.h", 
    ifil=0x7f4b8c9da1b0 "/build/cmsbld/jenkins/workspace/build-any-ib/w/tmp/BUILDROOT/edf168ccf7298d8c5f32702676caa57a/opt/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/src/IOPool/TFileAdaptor/src/TStor"..., dl=15, il=72) at include/Rtypes.h:181
#15 0x00007f4ba3708a8b in ROOT::TGenericClassInfo::GetClass (this=0x7f4b8c9e2aa0 <ROOT::GenerateInitInstanceLocal(TStorageFactoryFile const*)::instance>) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TGenericClassInfo.cxx:250
#16 0x00007f4b8c9caffe in TStorageFactoryFile::Dictionary() () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libIOPoolTFileAdaptor.so
#17 0x00007f4ba36ebedc in TClass::LoadClassDefault (requestedname=0x7f4b9df68bc0 "TStorageFactoryFile") at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:5580
#18 0x00007f4ba36ebe11 in TClass::LoadClass (requestedname=0x7f4b9df68bc0 "TStorageFactoryFile", silent=false) at /afs/cern.ch/user/p/pcanal/root_working/master/core/meta/src/TClass.cxx:5551
#19 0x00007f4ba361ce46 in TPluginHandler::LoadPlugin (this=0x7f4b8fda9740) at /afs/cern.ch/user/p/pcanal/root_working/master/core/base/src/TPluginManager.cxx:262
#20 0x00007f4ba3be050a in TFile::Open (url=0x7f4b8aca42b0 "file:single_neutrino_random.root", options=0x7f4b878fa808 "", ftitle=0x7f4b878fa808 "", compress=101, netopt=0) at /afs/cern.ch/user/p/pcanal/root_working/master/io/io/src/TFile.cxx:4082
#21 0x00007f4b878c9918 in edm::InputFile::InputFile(char const*, char const*, edm::InputType) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_0_ROOT6_X_2019-10-14-2300/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#22 0x00007f4b878b43d9 in edm::RootInputFileSequence::initTheFile(bool, bool, edm::InputSource*, char const*, edm::InputType) ()
   from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_0_ROOT6_X_2019-10-14-2300/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#23 0x00007f4b878f624c in edm::RootPrimaryFileSequence::RootPrimaryFileSequence(edm::ParameterSet const&, edm::PoolSource&, edm::InputFileCatalog const&) ()
   from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_0_ROOT6_X_2019-10-14-2300/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#24 0x00007f4b878b6f67 in edm::PoolSource::PoolSource(edm::ParameterSet const&, edm::InputSourceDescription const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_0_ROOT6_X_2019-10-14-2300/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#25 0x00007f4b878be0be in edmplugin::PluginFactory<edm::InputSource* (edm::ParameterSet const&, edm::InputSourceDescription const&)>::PMaker<edm::PoolSource>::create(edm::ParameterSet const&, edm::InputSourceDescription const&) const ()
   from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_0_ROOT6_X_2019-10-14-2300/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#26 0x00007f4ba4ae1206 in edm::InputSourceFactory::makeInputSource(edm::ParameterSet const&, edm::InputSourceDescription const&) const ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#27 0x00007f4ba4a7630b in edm::makeInput(edm::ParameterSet&, edm::CommonParams const&, std::shared_ptr<edm::ProductRegistry>, std::shared_ptr<edm::BranchIDListHelper>, std::shared_ptr<edm::ThinnedAssociationsHelper>, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration const>, edm::PreallocationConfiguration const&) () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#28 0x00007f4ba4a78242 in edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#29 0x00007f4ba4a7a275 in edm::EventProcessor::EventProcessor(std::shared_ptr<edm::ProcessDesc>, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#30 0x0000000000410bd9 in main::{lambda()#1}::operator()() const ()
#31 0x000000000040f2e2 in main ()

@pcanal
Copy link

pcanal commented Oct 17, 2019

So next we need to understand why:

no interpreter information for class TStorageFactoryFile is available even though it has a TClass initialization routine."

@pcanal
Copy link

pcanal commented Oct 17, 2019

and the problem seems to be that the python interpreter is 'shutdown/finalized' in the middle of the cmsRun process and that has the (new) consequence or marking TCling has been in shutdown mode which prevents it to give 'interpreter' information to TClass.

essential part of the stack trace

#24 Py_Finalize () at Python/pythonrun.c:411
#25 0x00007fa0595cf046 in pybind11::finalize_interpreter() () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#26 0x00007fa0595d194f in PyBind11ProcessDesc::~PyBind11ProcessDesc() () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#27 0x00007fa0595bfa1a in edm::cmspybind11::readConfig(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#28 0x00007fa059628229 in edm::readConfig(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreParameterSetReader.so
#29 0x00000000004105ae in main::{lambda()#1}::operator()() const ()
#30 0x000000000040f2e2 in main ()

full stack trace:

#0  TCling::ShutDown (this=0x7fa05448a980) at /afs/cern.ch/user/p/pcanal/root_working/master/core/metacling/src/TCling.cxx:1457
#1  0x00007fa057e0a7e2 in TROOT::EndOfProcessCleanups (this=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>) at /afs/cern.ch/user/p/pcanal/root_working/master/core/base/src/TROOT.cxx:1274
#2  0x00007fa04299502a in ?? ()
#3  0x0000000000000000 in ?? ()
....
#0  0x00007fa051982c25 in FastCall (method=140326320471776, args_=0x7ffe9f73ad50, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, result=0x0) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/Cppyy.cxx:430
#1  0x00007fa051982e1b in Cppyy::CallV (method=140326320471776, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, args=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/Cppyy.cxx:471
#2  0x00007fa05198eb85 in GILCallV (method=140326320471776, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/Executors.cxx:63
#3  0x00007fa0519903cf in PyROOT::TVoidExecutor::Execute (this=0x7fa0424eaec8, method=140326320471776, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/Executors.cxx:323
#4  0x00007fa0519c715e in PyROOT::TMethodHolder::CallFast (this=0x7fa044804440, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, offset=0, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/TMethodHolder.cxx:69
#5  0x00007fa0519c7636 in PyROOT::TMethodHolder::CallSafe (this=0x7fa044804440, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, offset=0, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/TMethodHolder.cxx:121
#6  0x00007fa0519c6b7c in PyROOT::TMethodHolder::Execute (this=0x7fa044804440, self=0x7fa058196d20 <ROOT::Internal::GetROOT1()::alloc>, offset=0, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/TMethodHolder.cxx:528
#7  0x00007fa0519c6e05 in PyROOT::TMethodHolder::Call (this=0x7fa044804440, self=@0x7fa051a46420: 0x7fa051a41a10, args=0x7fa05480a050, kwds=0x0, ctxt=0x7ffe9f73ad50) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/TMethodHolder.cxx:585
#8  0x00007fa051996d83 in PyROOT::(anonymous namespace)::mp_call (pymeth=0x7fa051a46410, args=0x7fa05480a050, kwds=0x0) at /afs/cern.ch/user/p/pcanal/root_working/master/bindings/pyroot/src/MethodProxy.cxx:597
#9  0x00007fa0574f42c3 in PyObject_Call (func=func@entry=0x7fa051a46410, arg=arg@entry=0x7fa05480a050, kw=kw@entry=0x0) at Objects/abstract.c:2547
#10 0x00007fa05759f767 in do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7ffe9f73aee8, func=0x7fa051a46410) at Python/ceval.c:4589
#11 call_function (oparg=<optimized out>, pp_stack=0x7ffe9f73aee8) at Python/ceval.c:4394
#12 PyEval_EvalFrameEx (f=f@entry=0x7fa0424927a0, throwflag=throwflag@entry=0) at Python/ceval.c:3009
#13 0x00007fa0575a44b2 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fa05480a068, argcount=<optimized out>, kws=kws@entry=0x7fa05480a068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3604
#14 0x00007fa05751dbd6 in function_call (func=0x7fa0429afd70, arg=0x7fa05480a050, kw=0x7fa051a28398) at Objects/funcobject.c:523
#15 0x00007fa0574f42c3 in PyObject_Call (func=func@entry=0x7fa0429afd70, arg=arg@entry=0x7fa05480a050, kw=kw@entry=0x7fa051a28398) at Objects/abstract.c:2547
#16 0x00007fa05759b8bd in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7ffe9f73b160, func=0x7fa0429afd70) at Python/ceval.c:4686
#17 PyEval_EvalFrameEx (f=f@entry=0x7fa051b06de0, throwflag=throwflag@entry=0) at Python/ceval.c:3048
#18 0x00007fa0575a44b2 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fa05480a068, argcount=<optimized out>, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3604
#19 0x00007fa05751db1c in function_call (func=0x7fa051d5fcf8, arg=0x7fa05480a050, kw=0x0) at Objects/funcobject.c:523
#20 0x00007fa0574f42c3 in PyObject_Call (func=func@entry=0x7fa051d5fcf8, arg=arg@entry=0x7fa05480a050, kw=0x0) at Objects/abstract.c:2547
#21 0x00007fa05759a313 in PyEval_CallObjectWithKeywords (func=func@entry=0x7fa051d5fcf8, arg=0x7fa05480a050, arg@entry=0x0, kw=kw@entry=0x0) at Python/ceval.c:4241
#22 0x00007fa0575c9ade in call_sys_exitfunc () at Python/pythonrun.c:1765
#23 Py_Finalize () at Python/pythonrun.c:430
#24 Py_Finalize () at Python/pythonrun.c:411
#25 0x00007fa0595cf046 in pybind11::finalize_interpreter() () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#26 0x00007fa0595d194f in PyBind11ProcessDesc::~PyBind11ProcessDesc() () from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#27 0x00007fa0595bfa1a in edm::cmspybind11::readConfig(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCorePythonParameterSet.so
#28 0x00007fa059628229 in edm::readConfig(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, char**) ()
   from /cvmfs/cms-ib.cern.ch/nweek-02598/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_0_ROOT6_X_2019-10-13-0000/lib/slc7_amd64_gcc820/libFWCoreParameterSetReader.so
#29 0x00000000004105ae in main::{lambda()#1}::operator()() const ()
#30 0x000000000040f2e2 in main ()

@Axel-Naumann
Copy link

Do I understand correctly that the CMS code in question tries to:

  • shut down the (python) interpreter
  • then opens a TFile?
    What's the intent behind that - is this a "temporary python interpreter"?

@davidlange6
Copy link

davidlange6 commented Oct 17, 2019 via email

@pcanal
Copy link

pcanal commented Oct 25, 2019

@etejedor any ideas?

@etejedor
Copy link

If Python is being shut down then the teardown handler that PyROOT programs will be executed and TROOT::EndOfProcessCleanups will run:
https://github.com/root-project/root/blob/master/bindings/pyroot/ROOT.py#L877
and that seems to end up calling TCling::Shutdown.

I understand this is a problem only now because before we did not call TCling::Shutdown from TROOT::EndOfProcessCleanups? Or what changed?

@pcanal
Copy link

pcanal commented Oct 28, 2019

Yes that is the challenge. We added the call to TCling::Shutdown so that for standalone python the process tear down would not trigger fresh initialization (that ends up using some objects that have already been tear down) but this must not be called if python is shutdown before the end of the process ... since, as it is the case here, the process goes on to need ROOT for the rest of its duration (and has another point where the shutdown is called, i.e. the TApplication object tear down).

@vgvassilev
Copy link

Should we call TROOT::EndOfProcessCleanups from within python? What is the motivation for doing that as opposed to leaving ROOT itself to decide when to call it?

@pcanal
Copy link

pcanal commented Oct 28, 2019

@vgvassilev See 7a592f5:

[Exp PyROOT] Run cleanup to nonify Python proxies at teardown

This is related to:
https://sft.its.cern.ch/jira/browse/ROOT-10295
https://bitbucket.org/wlav/cppyy/issues/160

It was observed that, under some circumstances (reproduced in
Python3.6.5 using a TFile and a TTree created inside a function,
destroying the objects at teardown time), the Python proxy of
a TFile-owned object is not nonified when running RecursiveRemove
on it. The reason is that, when trying to get the proxy from its
weak reference, the result is Py_None even if the object has not
been destroyed yet. As a result, the proxy is not nonified and
later tries to double delete its internal C++ object, resulting
in a crash because the C++ TFile also deleted it before.

Running EndOfProcessCleanups via a Python exit handler forces the
execution of RecursiveRemove and the right nonification of Python
proxies. This has been ported from old PyROOT (was in ROOT.py).

@vgvassilev
Copy link

I see, then we should probably have a special flag to the end of process cleanups denoting some entries are with ‘externally controlled lifetime’ meaning that they will be deleted externally before end of process. If they are not root will do that then. Something like shared_ptr semantics.

@smuzaffar
Copy link

We now get this error with ROOT 6.18 too. We do not get this error with root tag v6-18-04 but we do see it with tip of root 6-18-00-patches branch i.e. one of these ( root-project/root@v6-18-04...869553a ) changes are causing it

@Axel-Naumann
Copy link

Axel-Naumann commented Nov 29, 2019

Thanks, that seems to contradict @pcanal 's diagnosis in #136 (comment) that this is related to 7a592f5

I've created https://sft.its.cern.ch/jira/browse/ROOT-10469 to track this on our side.

@pcanal
Copy link

pcanal commented Dec 3, 2019

Indeed (but it does not really change the end result), the real cause is a 'tightening'/'increase' of the ROOT internal shutdown mechanism. I.e. now (in master and 6.18), the fact that the Shutdown has started is recorded and that information is use to prevent the use of (assumingly but incorrectly in this use case) deleted internal resources (like global static objects used for caching) and result in a (intentional) non-functional state.

I.e. we still need to resolve the dichotomy between the 'python-is-tear-down-mid-process' (CMS case) vs 'python-is-tear-down-at-the-end-of-process' (python prompt case).

@vgvassilev
Copy link

@smuzaffar, @etejedor, has pushed a fix in the master, however, could you test if this root-project#4675 solves your issue. This would help us better understand a class of problems that show up from time to time in ROOT.

@smuzaffar
Copy link

@vgvassilev , @Dr15Jones had fixed this unit test cms-sw/cmssw#28619 and the change is already merged. So testing root-project#4675 is not going to tell if it fixes the issue or not.

@vgvassilev
Copy link

Ah, too bad :(

@smuzaffar
Copy link

this was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants