-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault in the TrackProducer for EphemeralHLTPhysics #38869
Comments
A new Issue was created by @ebrondol Erica Brondolin. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign trk-dpg,tracking-pog |
assign reconstruction |
New categories assigned: tracking-pog,reconstruction @jpata,@slava77,@mmusich,@clacaputo,@vmariani you have been requested to review this Pull request/Issue and eventually sign? Thanks |
No. |
for the sake of those that will debug please post also the stack trace of the failing thread, the message you provided is not very informative. |
@slava77 maybe?
Already been done. |
was somebody able to reproduce this with a "pointed" config? It would be quite useful/effective to be able to get to the problematic event directly. |
Here is the stack trace
|
I think this falls under the responsibilities of the ORM. BTW, do I read correctly this is the problematic thread?
? |
Yes, thread 10 is where the problem lies. |
So, the problem is actually with the 2D templates interpolation and not tracking per se. |
@tvami, please remove the tracking type and assignment. |
unassign tracking-pog |
Done
Yes, I'll have a look... |
The simple configuration skip_cfg.py import FWCore.ParameterSet.Config as cms
from PSet import process
#3087
process.source.skipEvents = cms.untracked.uint32(3086)
process.options.numberOfThreads = 1 If run in the same directory as PSet.py an PSet.pkl will fail in the first event to be processed. |
2 more paused jobs have been reported in the CMS Talk thread: https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-ephemeralhltphysics-due-to-segfault/13318/3 I am testing these locally to see if the segmentation fault is similar |
reporting here also some private findings by @ferencek
It looks like the trajectory local parameters are not well defined which is in line with the warning that is emitted just before in the job:
clearly there is something fishy in the trajectory building, but can we patch protecting against ill defined trajectories? diff --git a/CondFormats/SiPixelTransient/src/SiPixelTemplate2D.cc b/CondFormats/SiPixelTransient/src/SiPixelTemplate2D.cc
index d9e3441e357..9380f61f63e 100644
--- a/CondFormats/SiPixelTransient/src/SiPixelTemplate2D.cc
+++ b/CondFormats/SiPixelTransient/src/SiPixelTemplate2D.cc
@@ -626,6 +626,12 @@ bool SiPixelTemplate2D::getid(int id) {
bool SiPixelTemplate2D::interpolate(int id, float cotalpha, float cotbeta, float locBz, float locBx) {
// Interpolate for a new set of track angles
+ //check for nan's
+ if (!edm::isFinite(cotalpha) || !edm::isFinite(cotbeta)) {
+ success_ = false;
+ return success_;
+ }
+
// Local variables
float acotb, dcota, dcotb;
@@ -680,12 +686,6 @@ bool SiPixelTemplate2D::interpolate(int id, float cotalpha, float cotbeta, float
#ifndef SI_PIXEL_TEMPLATE_STANDALONE
throw cms::Exception("DataCorrupt")
<< "SiPixelTemplate2D::illegal subdetector ID = " << thePixelTemp_[index_id_].head.Dtype << std::endl;
-
- //check for nan's
- if (!edm::isFinite(cotalpha) || !edm::isFinite(cotbeta)) {
- success_ = false;
- return success_;
- }
#else
std::cout << "SiPixelTemplate:2D:illegal subdetector ID = " << thePixelTemp_[index_id_].head.Dtype << std::endl;
#endif the job reported in the initial comment: #38869 (comment) runs successfully. |
I made a PR here: #38881 |
looking at the logs I see the traceback to where the segmentation fault occurs are the same. |
Yes, the |
Hi all. We have another instance of this segmentation fault issue with EphemeralHLTPhysics. Tarball can be found here:
It is not clear to me if this was completely solved by #38881 |
@germanfgv please clarify if the Tier0 moved to a new release including #38881, which is impossible given it does not exist yet. Adjust your expectations accordingly |
+1
|
I see another crash that is probably related to this issue, the backport has been already merged, the IB did not show issues related to it for what I can appreciate. Might release managers clarify the possible timescale for a (patch)-release including the fix, or other issues keeping it on hold? |
[as current shadow ORM]
A single job was paused while processing PromptReco for the EphemeralHLTPhysics PD due to segmentation fault in the
TrackProducer
:The error is locally reproducible and the job tarball (including PSet and log files) can be found here:
The full description of the issue can be found in CMS Talk.
@mmusich , could you please have a look?
The text was updated successfully, but these errors were encountered: