-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional patch based on 13_0_3 for avoiding both crashes and memory issues in prompt reco #41489
Comments
A new Issue was created by @mandrenguyen Matthew Nguyen. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction, operations |
New categories assigned: operations,reconstruction @fabiocos,@davidlange6,@rappoccio,@mandrenguyen,@clacaputo,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks |
One more PR that I think we should add is |
While not specifically related to the memory issues, could we please add this PR needed for nanoDQMIO production? |
Currently discussing within L1T (@eyigitba , @elfontan , @aloeliger ) if it is really needed here |
We would like to have this PR that introduces the new pp Scenario with updated HB thresholds to be used for Physics data-taking: PR #41368 |
#41402 by @cms-sw/l1-l2 is not an urgent fix, but I think it would be useful to include it in this upcoming patch. (I should have included it in #41402 is related to the remaining HLT crashes currently happening online. It replaces one the crashes [1] with a proper exception message. It will not change the number of failed jobs, but it will help debugging the problem. [1]
|
There is one more reason to add #41402 in this next patch release. @fwyzard explained to me that, since #41402 provides a proper exception for the current HLT crashes, we might be able to configure HLT to skip the problematic events (as opposed to crashing and losing all the events of that job, most of which do not crash). This could be done using So, it is possible that having #41402 in the next patch release could give us a way to reduce HLT crashes. |
I have prepared the PR #41497 on the branch CMSSW_13_0_3_patchHLT which includes the PRs that I believe more urgent to be included in the proposed patch release:
The other PRs listed in this issue cannot, in my opinion, be considered that urgent; or alternatively they imply some larger amount of updates than what is supposed for a quick fix that allows running at P5 during the (hopefully) few days needed to find a solution for the so far blocking issue #41457 If any one of the proposers of the other PRs listed in this issue believes that some more should be really and urgently added, please comment here: we can always update that PR |
@perrotta In case the replay turns out to be successful, we might consider switching to |
Thank you @saumyaphor4252 , that would be great! |
@emanueleusai DQM PRs can be added by hand to the online DQM release: let do so, if needed, in order not to add too many updates to this temporary patch |
@perrotta this is not a PR for DQM online, it is needed for NanoDQMIO production, which happens at T0. We need this in a release for it to be used in official production on NanoDQMIO. |
The reason for the urgency of #41400 is that we need to start the official production of NanoDQMIO for data certification. |
+1 |
As just discussed in the ORP, and in light of the ongoing memory issue in prompt reconstruction, one possibility is to build a new release based on 13_0_3_patch1, including any bug fixes needed to avoid the crashes in prompt reco reported here:
#41397
#41442
I'm aware of the following two PRs:
#41473
#41454
Is anyone aware of anything else?
The text was updated successfully, but these errors were encountered: