Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQM visualization-live and visualization-live-secondInstance crash during HI cosmics #46553

Closed
nothingface0 opened this issue Oct 30, 2024 · 8 comments · Fixed by #46563
Closed

Comments

@nothingface0
Copy link
Contributor

nothingface0 commented Oct 30, 2024

We noticed that the DQM visualization clients, visualization-live and visualization-live-secondInstance crash during some of the latest cosmic runs, namely: 387579,387557,387556,387555,387552,387548,387546,387544,387541,387539,387531,387338,387240,387235,387212,387209,387207

The exception is:

----- Begin Fatal Exception 29-Oct-2024 15:06:19 CET-----------------------
An exception of category 'NoProductResolverException' occurred while
   [0] Processing  Event run: 387552 lumi: 9 event: 19781392 stream: 3
   [1] Running path 'FEVToutput_step'
   [2] Prefetching for module JsonWritingTimeoutPoolOutputModule/'FEVToutput'
   [3] Calling method for module DeDxHitInfoProducer/'dedxHitInfoCosmicTF'
Exception Message:
No data of type "ClusterShapeHitFilter" with label "ClusterShapeHitFilter" in record "CkfComponentsRecord"
 Please add an ESSource or ESProducer to your job which can deliver this data.
----- End Fatal Exception -------------------------------------------------

More logs here.

First instance was during run 387207 (24/10/2024). Not all cosmic runs lead to this behavior, however. There was no such crash during 387559, for example.

We were using CMSSW_14_1_1 and CMSSW_14_1_4_patch1 at the time of the crashes, with Global Tag 141X_dataRun3_Express_v3. We have not tested yet if this is reproducible with 14_0_X.

Any input is appreciated.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 30, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @nothingface0.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign dqm

@makortel
Copy link
Contributor

@cms-sw/trk-dpg-l2

@cmsbuild
Copy link
Contributor

New categories assigned: dqm

@antoniovagnerini,@nothingface0,@rvenditti,@syuvivida,@tjavaid you have been requested to review this Pull request/Issue and eventually sign? Thanks

@nothingface0
Copy link
Contributor Author

This does not seem to be reproducible with 14_0_15_patch1 and the 140X_dataRun3_Express_v3 Global Tag.

We do get lots of the following however:

%MSG                                                                                                                      
Begin processing the 14th record. Run 387552, Event 48791634, LumiSection 20 on stream 7 at 30-Oct-2024 16:02:51.222 CET  
%MSG-e TooManyClusters:  CosmicSeedGenerator:cosmicseedfinderP5  30-Oct-2024 16:02:51 CET Run: 387552 Event: 48791634     
Found too many clusters (379), bailing out.                                                                               
                                                                                                                          
%MSG                                                                                                                      
%MSG-e TooManyClusters:  SimpleCosmicBONSeeder:simpleCosmicBONSeeds  30-Oct-2024 16:02:51 CET Run: 387552 Event: 48791634 
Found too many clusters (379), bailing out.                                                                               
                                                                                                                          
%MSG                                                                                                                      

@mmusich
Copy link
Contributor

mmusich commented Oct 31, 2024

The exception is:

It's very likely the problem is due to #45016 @stahlleiton FYI

We do get lots of the following however:

alas that's normal, see #46283 for details.

@mmusich
Copy link
Contributor

mmusich commented Oct 31, 2024

#46563 offers a trivial fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants