Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Exception in Prompt Reco of Run 367232, datatset JetMET0 #1120

Open
aloeliger opened this issue May 12, 2023 · 3 comments
Open

Fatal Exception in Prompt Reco of Run 367232, datatset JetMET0 #1120

aloeliger opened this issue May 12, 2023 · 3 comments
Labels
Phase-1 Pertains to phase-1 development Software Error Bug causing segfaults, memory leaks, run time errors, or compilation problems URGENT

Comments

@aloeliger
Copy link

What:

Taken from cms-sw#41645

Dear all,
there is one job failing Prompt Reco for run 367232, datatset JetMET0 with a Fatal Exceptionas described in https://cms-talk.web.cern.ch/t/fatal-exception-in-prompt-reco-of-run-367232-datatset-jetmet0/23996

The crash seems to originate from the module L1TObjectsTiming:

----- Begin Fatal Exception 12-May-2023 03:17:41 CEST-----------------------
An exception of category 'StdException' occurred while
   [0] Processing  Event run: 367232 lumi: 190 event: 378449946 stream: 6
   [1] Running path 'dqmoffline_1_step'
   [2] Calling method for module L1TObjectsTiming/'l1tObjectsTiming'
Exception Message:
A std::exception was thrown.
vector::_M_range_check: __n (which is 9) >= this->size() (which is 5)
----- End Fatal Exception -------------------------------------------------

The exception is reproducible on a lxplus8 node under CMSSW_13_0_5_patch2 (el8_amd64_gcc11).
Full logs and PSet.py can be found at https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run367232_JetMET0/Reco/vocms014.cern.ch-415905-3-log.tar.gz
With this modified PSet.py file the crash occurs immediately:

import FWCore.ParameterSet.Config as cms
import pickle
with open('PSet.pkl', 'rb') as handle:
    process = pickle.load(handle)
    process.options.numberOfThreads = 1
    process.source.skipEvents=cms.untracked.uint32(2683)

It should be noted that the crash is preceded by these warning (perhaps related):

%MSG-w L1TStage2uGTTiming:   L1TStage2uGTTiming:l1tStage2uGTTiming@streamBeginRun 12-May-2023 08:33:34 CEST  Run: 367232 Stream: 0
Algo "L1_SingleJet60er2p5" not found in the trigger menu L1Menu_Collisions2023_v1_0_0. Could not retrieve algo bit number.
%MSG
%MSG-w L1TStage2uGTTiming:   L1TStage2uGTTiming:l1tStage2uGTTiming@streamBeginRun 12-May-2023 08:33:34 CEST  Run: 367232 Stream: 0
Algo "L1_SingleJet60_FWD3p0" not found in the trigger menu L1Menu_Collisions2023_v1_0_0. Could not retrieve algo bit number.
%MSG

Recipe to recreate:

https://eoscmsweb.cern.ch/eos/cms/store/logs/prod/recent/PromptReco/PromptReco_Run367232_JetMET0/Reco/vocms014.cern.ch-415905-3-log.tar.gz

@aloeliger aloeliger added Phase-1 Pertains to phase-1 development Software Error Bug causing segfaults, memory leaks, run time errors, or compilation problems labels May 12, 2023
@eyigitba
Copy link

eyigitba commented May 12, 2023

Hi @aloeliger , I believe this is due to the changes in menu v1_0_0. They are described here: https://github.com/cms-l1-dpg/L1MenuRun3/tree/master/official/L1Menu_Collisions2023_v1_0_0

The relevant line for those warnings would be this:

Eta change for L1 FWD triggers (CMSLITDPG-1086)
Removed the following seeds:

  • L1_SingleJet60er2p5
  • L1_SingleJet90er2p5
  • L1_SingleJet60_FWD3p0
  • L1_SingleJet90_FWD3p0

Added four new seeds:

  • L1_SingleJet35_FWD2p5 (bit 320)
  • L1_SingleJet60_FWD2p5 (bit 321)
  • L1_SingleJet90_FWD2p5 (bit 322)
  • L1_SingleJet120_FWD2p5 (bit 323)

I'm not sure why l1tStage2uGTTiming is looking for these triggers if they are not in the menu.

@aloeliger
Copy link
Author

Following investigation here: cms-sw#41645 (comment) this is potentially a symptom of corrupted data.

@aloeliger
Copy link
Author

For documentation purposes, the prefered solution to these issues seems to be to change unpacker function to check it's output for corruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Phase-1 Pertains to phase-1 development Software Error Bug causing segfaults, memory leaks, run time errors, or compilation problems URGENT
Projects
None yet
Development

No branches or pull requests

2 participants