Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM bin-by-bin comparison missing after moving to ROOT 6.30 #43590

Closed
smuzaffar opened this issue Dec 18, 2023 · 49 comments · Fixed by #46551
Closed

DQM bin-by-bin comparison missing after moving to ROOT 6.30 #43590

smuzaffar opened this issue Dec 18, 2023 · 49 comments · Fixed by #46551

Comments

@smuzaffar
Copy link
Contributor

Looks like DQM bin by bin comparison plots are not available after moving to ROOT 6.30 on 13th Dec. ROOT was first integrated for CMSSW_14_0_X_2023-12-13-1100 IB and all PR tests using this or above IB have empty DQM plots e.g see

Log files for DQM comparison do not show any error ( https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_0_X_2023-12-17-0000+79a965/60327/dqmBinByBinLog.log ) .

@cms-sw/dqm-l2 , could this is issue with DQM gui not able to process root 6.30 based plots?

@smuzaffar
Copy link
Contributor Author

type root

@smuzaffar
Copy link
Contributor Author

assign dqm

@cmsbuild
Copy link
Contributor

New categories assigned: dqm

@rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 18, 2023

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @smuzaffar Malik Shahzad Muzaffar.

@Dr15Jones, @smuzaffar, @makortel, @rappoccio, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@nothingface0
Copy link
Contributor

Hello, this was already reported last week by another group as well (@cms-sw/pdmv-l2), and your suspicion seems to be correct, it's due to the GUI using ROOT 6.14 which produces error messages when opening ROOT files created by 6.30:

Error in <TList::Clear>: A list is accessing an object (0x7f53bc738600) already deleted (list name = TList)

Since there are no plans to update the ROOT package supported by the comp team (here) to newer versions, we will proceed to ignore those messages for now, until we overhaul our DQMGUI deployment procedure.

We are in the process of deploying a fix any day now.

I will post here if we discover anything else.

(on behalf of DQM-DC)

@rovere
Copy link
Contributor

rovere commented Dec 18, 2023

Do we understand the origin of this error?

@nothingface0
Copy link
Contributor

Do we understand the origin of this error?

From my side, no. Any input is welcome.

@makortel
Copy link
Contributor

Let's tag @pcanal here too.

@nothingface0
Copy link
Contributor

Let's tag @pcanal here too.

No need, I already opened a thread on the ROOT forum :)

@makortel
Copy link
Contributor

@cms-sw/pdmv-l2 Was this error not seen in 14_0_0_pre0 + ROOT 6.30 RelVal production? Or those RelVals are not impacted?

@makortel
Copy link
Contributor

@cms-sw/pdmv-l2 Was this error not seen in 14_0_0_pre0 + ROOT 6.30 RelVal production? Or those RelVals are not impacted?

Reading again https://mattermost.web.cern.ch/cms-o-and-c/pl/ig7t5innq7b65mr9jns1y95gpo I see those RelVals were impacted.

@smuzaffar I guess we'd need to revert 6.30 for 14_0_0_pre2?

@AdrianoDee
Copy link
Contributor

@makortel the production itself went smoothly. All the data-tiers have been created and are regularly available on DAS.

Then when we tried to build the RelMon we noticed that the DQM were not actually uploaded to the DQM GUI and this is what started the investigations mentioned by @nothingface0 here.

@makortel
Copy link
Contributor

Then when we tried to build the RelMon we noticed that the DQM were not actually uploaded to the DQM GUI and this is what started the investigations mentioned by @nothingface0 here.

Thanks. But DQM GUI is critical part of the pipeline towards validators, right?

While the problem, in principle, can be addressed from DQM GUI side, I feel the CMSSW-side action should be on the table as well (even if I want ROOT 6.30 in production for 14_0_0).

@AdrianoDee
Copy link
Contributor

A note, if it helps: when trying to run the HARVESTING step with 14_0_0_pre0 on top of 14_0_0_pre0_ROOT630 file I get a Fatal ROOT Error with the same message as the one reported for the visDQMReceiveDaemon.

----- Begin Fatal Exception 18-Dec-2023 16:06:23 CET-----------------------
An exception of category 'FileOpenError' occurred while
   [0] Calling InputSource::readFile_
   [1] Opening DQM Root file
Exception Message:

Input file file:step3_inDQM.root was not found, could not be opened, or is corrupted.
   Additional Info:
      [a] Fatal Root Error: @SUB=TList::Clear
A list is accessing an object (0x7f0738d930c0) already deleted (list name = TList)

----- End Fatal Exception -------------------------------------------------

In case somebody wants to reproduce it:

cmsDriver.py step4 --conditions auto:phase1_2023_realistic --era Run3_2023 --filein /store/relval/CMSSW_14_0_0_pre0_ROOT630/RelValBuMixing_14/DQMIO/PU_133X_mcRun3_2023_realistic_v2_el8_amd64_gcc12-v1/2590000/5892F3A7-1CC2-4989-B5A5-C72D9B038ECE.root --fileout file:step4.root --filetype DQM --geometry DB:Extended --mc --number 10 --python_filename step_4_cfg.py --scenario pp --step HARVESTING:@standardValidation+@standardDQM+@ExtraHLT+@miniAODValidation+@miniAODDQM+@nanoAODDQM

@AdrianoDee
Copy link
Contributor

Thanks. But DQM GUI is critical part of the pipeline towards validators, right?

Yes, indeed, it's critical and I agree with the fact that an action on CMSSW side could be on the table. Then, I think, having a solution (preferably not a workaround) on the DQM GUI side could allow us to have, in parallel, the physics validation in place with the samples already produced.

@makortel
Copy link
Contributor

While the problem, in principle, can be addressed from DQM GUI side, I feel the CMSSW-side action should be on the table as well (even if I want ROOT 6.30 in production for 14_0_0).

Then, I think, having a solution (preferably not a workaround) on the DQM GUI side could allow us to have, in parallel, the physics validation in place with the samples already produced.

Just to be clear (especially because I won't be able to attend ORP tomorrow), I'm specifically thinking what should we do for CMSSW_14_0_0_pre2.

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Dec 18, 2023

@cms-sw/orp-l2 @makortel , I am fine with reverting to ROOT 6.26 for 14.0.0.pre2. If we decide to go this path then we should do it at least 1 day (i.e. today) before we build 14.0.0.pre2 .

By the way, we also have seen this Error in <TList::Clear>: A list is accessing an object (0x7fffc1cafe80) already deleted (list name = TList) error during the PowerPC validation couple of years ago but that might be due to disk quota exceeded

@makortel
Copy link
Contributor

From the ROOT forum, it seems like the fix on the file reading ROOT side might be straightforward
https://root-forum.cern.ch/t/error-in-tlist-clear-a-list-is-accessing-an-object-already-deleted-list-name-tlist-when-opening-a-file-created-by-root-6-30-using-root-6-14-09/57588/5

@nothingface0
Copy link
Contributor

From the ROOT forum, it seems like the fix on the file reading ROOT side might be straightforward root-forum.cern.ch/t/error-in-tlist-clear-a-list-is-accessing-an-object-already-deleted-list-name-tlist-when-opening-a-file-created-by-root-6-30-using-root-6-14-09/57588/5

Does that "simply" mean applying the patch and recompiling?

@pcanal
Copy link
Contributor

pcanal commented Dec 18, 2023

It should. I am checking.

@pcanal
Copy link
Contributor

pcanal commented Dec 18, 2023

The patch applies cleanly, works and has been pushed to the v6-14-00-patches branch.

@nothingface0
Copy link
Contributor

The patch applies cleanly, works and has been pushed to the v6-14-00-patches branch.

Much appreciated!

@smuzaffar
Copy link
Contributor Author

@nothingface0 , I can open a PR for cmsdist comp_630 branch to include this fix

@nothingface0
Copy link
Contributor

@nothingface0 , I can open a PR for cmsdist comp_630 branch to include this fix

@smuzaffar Already on it!

@smuzaffar
Copy link
Contributor Author

Note that https://github.com/cms-sw/cmsdist/blob/comp_gcc630/root.spec is not using the tip of root 6.14 branch. So I would suggest to only apply https://github.com/root-project/root/commit/65ed49a726bd293edd259f2ceccbd7dc8756808f.patch on top of what is already used by comp_gcc630

@smuzaffar
Copy link
Contributor Author

any suggestion @cms-sw/dqm-l2 to avoid this issue in future?

@nothingface0
Copy link
Contributor

@smuzaffar your suggestion sounds good:

It will help if there is unit tests which can upload something to DQM GUI and check if the upload and indexing was successful.

I don't know where this unit test should be implemented, though.

@makortel
Copy link
Contributor

It will help if there is unit tests which can upload something to DQM GUI and check if the upload and indexing was successful.

I don't know where this unit test should be implemented, though.

Maybe in DQMServices/Components/test? Or would be setup need to be more complicated?

@nothingface0
Copy link
Contributor

Maybe in DQMServices/Components/test? Or would be setup need to be more complicated?

I don't think that the setup would be too complicated to just upload a file to DQMGUI, as long as the test-running machine has access to the CERN Network. We could use the test instance of DQMGUI to receive a test ROOT file.

One question, though, how could one have a ROOT file from within the test, which is at whatever ROOT version CMSSW is using? Are there any samples? Or should one be generated somehow from within the test?

@smuzaffar
Copy link
Contributor Author

@nothingface0 , thanks for looking in to this. Yes, I think it is better to generate one from the test itself. This way we are sure that file has been generated with the right root version

@smuzaffar
Copy link
Contributor Author

Or you can make use of output of one of existing unit test and upload it e.g. https://github.com/cms-sw/cmssw/blob/master/DQMServices/Demo/test/BuildFile.xml#L8 test do generate some root files so you can add another test e.g.

< test name="testDQMGUI" command="test-DQMGUI.sh">
  <flags PRE_TEST="TestDQMServicesDemo"/>
</test>

scram will run testDQMGUI after TestDQMServicesDemo

@smuzaffar
Copy link
Contributor Author

looks like this is happening again for 14.0.X and 14.1.X PR (where we are using root 6.30) . See #44366 (comment) and https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_14_0_X_2024-03-11-1100+7504b9/61598/dqm-histo-comparison-summary.html

@smuzaffar
Copy link
Contributor Author

@cms-sw/dqm-l2 , do you know if you are still using the patched DQMGUI with root 6.14 ?

@nothingface0
Copy link
Contributor

nothingface0 commented Mar 13, 2024

@smuzaffar Thanks, it's a different problem this time (same as #38980), we will try to fix it ASAP

@smuzaffar
Copy link
Contributor Author

we really need a unit test to catch it as early as possible

@nothingface0
Copy link
Contributor

we really need a unit test to catch it as early as possible

Just to make it clear, this is not related to the ROOT version being used, so it's not something that can be caught with unit tests from the CMSSW side. This is something that we need to understand how to tackle from our side.

@smuzaffar
Copy link
Contributor Author

A simple tests which you described in #43590 (comment) should work. All it needs to do is to upload the root file and then make sure dqm plots are available?

@nothingface0
Copy link
Contributor

nothingface0 commented Mar 15, 2024

@smuzaffar We have a temporary solution in place, we will try and see if we can restore the files that failed to get imported the last few days.

Edit: we are currently importing the missing files of the last 7 days.

@makortel
Copy link
Contributor

makortel commented Sep 5, 2024

Hi, is there any news on the unit test? Or is anything else missing from this issue?

@makortel
Copy link
Contributor

Ping @cms-sw/dqm-l2

@smuzaffar
Copy link
Contributor Author

@cms-sw/dqm-l2 , is there any update on this ?

nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 30, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 30, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 30, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 30, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 30, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 31, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
nothingface0 added a commit to nothingface0/cmssw that referenced this issue Oct 31, 2024
It uses a ROOT file generated by the TestDQMServicesDemo test and
uploads it to the dev DQMGUI. This should cover the case where CMSSW
moves to a new ROOT release, which might be incompatible with the one
being used at the DQMGUI.

Resolves: cms-sw#43590
@github-project-automation github-project-automation bot moved this from Work in CMS to Done in ROOT prioritization Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants