-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception on HLT_DoubleMediumDeepTauPFTauHPS35_eta2p1 Phase-2 workflow #42862
Comments
A new Issue was created by @srimanob Phat Srimanobhas. @Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
FYI @mmusich @rovere @SohamBhattacharya |
assign upgrade,hlt |
New categories assigned: upgrade,hlt @AdrianoDee,@mmusich,@missirol,@srimanob,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Shouldn't one use cmssw/HLTrigger/Configuration/python/HLT_75e33/modules/hltHpsPFTauDeepTauProducer_cfi.py Line 14 in a288c8f
and also deepTau_2026v2p5_core.pb etc files in here? cmssw/HLTrigger/Configuration/python/HLT_75e33/modules/hltHpsPFTauDeepTauProducer_cfi.py Lines 10 to 12 in a288c8f
|
I see this is also there in the 13_1 backport. |
if this path fires seldom enough it might be randomly failing in a 9k events relval. |
No. We can't let this randomly fail relvals (used by everyone else). Please either fix or disable the path. |
We don't see the failure in 13_1 because the last relvals came with CMSSW_13_1_0_pre4. The backport PR was merged in Aug, so we don't see it. I agree with @mmusich if we can't fix at this moment, we should disable the path. |
Ok, how urgent it is? I was working on updating it due to fixes needed, then focused on performance improvement since it was suggested in the Phase2 meeting. It may take some days to fix it. If it is urgent, then we can disable it. |
To me, the next pre-release, CMSSW_13_3_0_pre4, is targeted on 2023/10/17. So if you can't fix it by, let's say, a week before (10 Oct), then please make a PR to disable it. Other may have different comments. |
Agreed, the path should be disabled for the time being if it crashes RelVals. |
before disabling the path, I'd just like to make sure that the fix proposed at #42862 (comment) is not enough. |
As shown in slide 18 of https://indico.cern.ch/event/1322372/#sc-2-5-taus, it gives some matrix incompatibility issue. I have another deadline this week, but I can focus on solving the issue next week if it is ok. If that is late, we can disable it. Is there anything that I should do for disabling the path? |
removing any reference from the menu (while keeping the configuration fragment) should be enough. EDIT: to be fully explicit, remove it from here:
|
Next week should be fine. I'll check back here in one week time. |
Confirm that it's enough to remove it from |
Once the performance of the Tau Paths is better understood, we can hook those in the |
@hsert please update on the status of the fix. Thank you. |
I have tried several things, but couldn't figure it out yet. Probably it would be safer to disable the path. Sorry for the inconvenience. |
I have created #42955, just in case. |
Hi @cms-sw/pdmv-l2 Thx. |
answering to myself, I found a
that enables the |
+1
|
@cmsbuild, ping? seems stuck |
@smuzaffar I signed this issue for HLT above #42862 (comment) but the bot doesn't recognize the signature. Can you please have a look? Thank you. |
@mmusich , there was a bug in bot due to which it was not processing the comments for issues properly. It is fixed now and your last comment has already fixed the labels for this issue |
This is happening due to a cmssw/RecoTauTag/RecoTau/plugins/DeepTauId.cc Line 1123 in 619b1aa
See discussion in #44333 (comment) |
assign ml (to get updates) |
New categories assigned: ml @valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks |
technically this is issue is solved by #43855. @cms-sw/upgrade-l2 do you have objections to close ? |
Hi @mmusich, I don't think #43855 is protecting us against empty tensors. I'm not a expert of the DeepTau code but the problem is related to evaluating the model over an empty CellGrid. I don't know why the grid used in the producer is empty in the events that are crashing the run.. maybe @hsert may have an idea?
|
as far as I could test, the exceptions reported here are gone with that PR, see #43855 (comment) |
Your test in #43855 (comment) is quite clear. I think this depends on the model version. #43855 is indeed using a different model than the other issue #44333. Maybe the new models is not giving empty cells? It would be interesting to understand what changes.. I think implementing a guard against empty inputs would at least make the issue clearer |
I agree, but I would not mix different issues (it is fine to link #44333 to this one, but this one is technically resolved), I would continue the discussion in the main one. |
+1 |
for record, these add an extra layer of protection: |
+Upgrade |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
I just see relvals failure [1] from [2]. Not sure if this link to DeepTauId issue reported before in #40437 and #40733, as the error report looks different. I try to look on 13_1_0_preX relvals report, but I don't see this issue, so I am not clear what condition to make this exception happens.
[1]
[2]
https://cmsweb.cern.ch/couchdb/workloadsummary/_design/WorkloadSummary/_show/histogramByWorkflow/pdmvserv_RVCMSSW_13_3_0_pre2ZpToMM_m6000_14TeV__2026D98noPU_RV213_230913_122713_9881
https://cmsweb.cern.ch/couchdb/workloadsummary/_design/WorkloadSummary/_show/histogramByWorkflow/pdmvserv_RVCMSSW_13_3_0_pre2TenTau_15_500_Eta3p1__2026D98noPU_RV213_230913_122651_78
The text was updated successfully, but these errors were encountered: