-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyROOT in root 6.24 branch hangs while loading CMSSW library #7718
Comments
the root commit b802a6b was working fine. So the changes which we are testing are b802a6b...7c0cfac |
Would you be able to provide me with a valgrind report? |
This might be a dupe of #7657 - but I'd like to make progress with both of them independently to not serialize progress towards v6.24/00! (I.e. I'd still very much appreciate the valgrind report.) |
@Axel-Naumann , valgrind also hangs without printing nay usefull information. Under gdb I see this [a]. If I build
|
The call to |
Help! @pcanal maybe? What I'm after, given the backtrace of #7718 (comment) , is which frames cause the inf loop, i.e. which is the frame that doesn't return. @smuzaffar what we can also try, because of the "only happens in release builds", is #7752 |
@Axel-Naumann , do you have #7752 equivalent for v6.24 branch? |
@smuzaffar that's #7767 |
thanks, I am testing these now |
@Axel-Naumann , looks like some latest development in v6.24 branch has fixed the hanging issue. I have tested 126c9c8 (without #7767) and this time cmssw build was successful. We get runtime errors now, see the details here cms-sw/cmsdist#6777 (comment) . You can find the crash log https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a7de73/13897/runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step3_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log. With your changes in #7767 ( on top of 126c9c8 ) , things look in much better state. PR tests ( cms-sw#153 (comment) ) show no build or run time errors. But we do see some comparison differences for our reconstruction code. |
ROOT master based CMSSW releases also show the same resulsts:
|
Thank you, @smuzaffar ! Am I reading this correctly that it's currently unknown whether the comparison differences are introduced by #7752 or were pre-existing? |
These differences were not there when last time we updated to v6.24 branch commit b802a6b (see cms-sw/cmsdist#6730 (comment) and https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_3_X_2021-03-14-2300+963134/41710/validateJR.html ). It could be that some of latest cmssw changes might have caused these differences, I am re-running the tests based on latest cmssw release now |
@Axel-Naumann , based on latest cmssw ROOT624 IB, the comparison results look good now cms-sw#153 (comment) . |
Phew, thanks, @smuzaffar But I have bad news: I needed to revisit the PR #7752 / #7767 . I will have to ask for your favor to test another PR replacing those, sorry about that! (But at least we now know what the baseline is - the comparison is good!) I'll be back... |
no problem, just ping me when you have something to test @Axel-Naumann |
I now have the following:
Could you let me know whether that also solves the issues you saw? Any one of them is enough. Thank you, @smuzaffar ! |
I am testing these now |
@Axel-Naumann , v6.24 fix looks good for CMS . All tests passed cms-sw#155 (comment) |
See also #7754 |
@pcanal @Axel-Naumann this issue is open, but marked as fixed. Reading #7718 (comment), should this maybe also say 6.24/02? |
@alja and I saw exceptions when reading PFCandiadate vector, the size obtained by the auto-generated code was O(10^20). We have a relatively slim reproducer on top of FWLite, can dig it out / give access to the machine here at UCSD if relevant.
|
@pcanal, this is a bit of a pain for CMS as we have to put an "undo commit" on all our 6.24 and master branches, e.g.: We probably have several incarnation of this undo in various branches :( |
I know :( ... This is currently the bug I am working on next. What am I mostly missing (because I stepped away from it too long) is a reproduce of the failing case (ideally I will need a standalone reproducer to add to roottest). |
If it is any help I reference here a simple cmssw module [1]. When compile it with FWLite (built with root master) you can reproduce the crash. Below is the binary from the test module and the sample file: Crash is the line https://github.com/alja/OssTests/blob/root-test/BranchAddr/bin/test-bname-for.cc#L95 |
@pcanal |
@osschar , note that we are only applying the revert for root master branch ( https://github.com/cms-sw/root/commits/cms/master/03d7710 ) .root 6.24 branch of cmssw ( https://github.com/cms-sw/root/commits/cms/v6-24-00-patches/f4ad42e ) does not need the revert. |
@pcanal , I am testing cms root master branch ( https://github.com/cms-sw/root/commits/cms/master/03d7710 ) without the revert of ofending commit (cms-sw@f9834e3 ) , once it is available then hopefully I will be able to provide you the instructions for reproducer |
@pcanal , please use
to create cmssw dev and then run the commands in #7718 (comment) and #7718 (comment) to reproduce the errors. |
@smuzaffar Thanks. |
@pcanal note that |
@smuzaffar Thanks for the heads up. I made significant progress and will "hopefully" not need this build that long! :) |
@smuzaffar I pushed several related PR to v6.24 and v6.26 (and master). Can you verify that one of them now work properly for this case? Thanks. |
@pcanal @smuzaffar |
Thanks! This means v6-26-00-patches is now fine for CMS, @smuzaffar ? |
yes root 6.26, after @pcanal PR #9927 , looks good. It has fixed the issue and IBs look good. We still have random root file read error but I do not think those are related to root change. |
Fair enough. Let's close this issue and re-open a new one if needed be. |
Hi,
We are trying to update root 6.24 branch (commit 7c0cfac) in CMSSW special integration builds (https://github.com/cms-sw/cmsdist/pull/6746/files ) but looks like pyROOT fails/hangs for some special dictionaries.
While building cmssw , we use https://github.com/cms-sw/cmssw/blob/master/FWCore/Utilities/scripts/edmCheckClassVersion to check for root dictionaries class versions. This works for most of our dictionaries e.g following two run fine ( https://github.com/cms-sw/cmssw/blob/master/DataFormats/TauReco/src/classes_def_hlt.xml, https://github.com/cms-sw/cmssw/blob/master/DataFormats/TauReco/src/classes_def_1.xml )
but it fails/hangs for https://github.com/cms-sw/cmssw/blob/master/DataFormats/TauReco/src/classes_def_2.xml
Most of the times the above command just hangs with error https://muzaffar.web.cern.ch/root624/err1.log but once I was able to get this error https://muzaffar.web.cern.ch/root624/err.log . Can you please look in to it and see if this log helps?
In case you want to try it yourself then you go to cmsdev25 and do
FYI @mrodozov @makortel @Dr15Jones
The text was updated successfully, but these errors were encountered: