-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RelVal 132.0: (PDF) Info file not found #43526
Comments
A new Issue was created by @iarspider . @smuzaffar, @makortel, @rappoccio, @antoniovilela, @Dr15Jones, @sextonkennedy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign GeneratorInterface/Pythia6Interface |
New categories assigned: generators @alberto-sanchez,@bbilin,@GurpreetSinghChahal,@mkirsano,@menglu21,@SiewYan you have been requested to review this Pull request/Issue and eventually sign? Thanks |
The problem is LHAPDF uses thread locals to store its 'cached' values. In particular there is a container /// Collection of active sets
static thread_local map<int, PDFSetHandler> ACTIVESETS; In our code, this gets initialized at begin LuminosityBlock by the call to
In the job I was running, this happened on thread 5. Then later in the job, a call to
The problem is, the call to |
Tagging a related LHAPDF update cms-sw/cmsdist#8852 |
This was most likely caused by cms-sw/cmsdist#8852 |
So it looks like the FORTRAN adapter layer for LHAPDF assumes all FORTRAN calls will occur on one and only one thread (or at least all FORTRAN calls will happen independently on each thread). |
I can confirm that the change from LHAPDF 6.4.0 to 6.5.4 is the reason. Between those two version there was the change from static map<int, PDFSetHandler> ACTIVESETS; to static thread_local map<int, PDFSetHandler> ACTIVESETS; |
Probably the only safe change would be to modify |
Would it be feasible to give feedback that this change didn't work for us? |
I can undertake to communicate the problem to the authors |
@mkirsano , should we revert the lhapdf change ? |
I am trying to check if we can revert only lhapdf library. |
I checked that it is possible to use new pdf sets with lhapdf 6.4.0 (checked on pythia8, Generator step, 1 thread). |
cms-sw/cmsdist#8869 reverts LHAPDF to 6.4.0 while keeping the pdf sets to 6.5.1 |
The problem is reported to lhapdf-support. Will write also to the main developer. |
This is the answer from lhapdf developers: The initialization needs to be called explicitly on each thread: they cannot inherit their memory from the calling process, as shared memory is open to run conditions and cross-thread corruptions. So where you say that you only call pyinit once, you should change the calling code to do so thread-locally! |
Thanks @mkirsano, sounds like the way forward is then what @Dr15Jones suggested in #43526 (comment). (and for what it's worth, from our standpoint this is a terrible way to handle multithreading) |
Workflow 561.0 crashed in CMSSW_14_0_X_2023-12-13-2300 on el9_amd64_gcc12
I'm not sure if it is related to this issue, or (way) earlier crashes #35082 #35251 |
Since CMSSW_14_0_X_2023-12-04-1100 RelVal 132.0 fails for all flavors/architectures:
Indeed, the code tries to open files like
/_0000.dat
,/.info
or even""
(empty string). This is the origin of one such call (extracted with strace -f -k`: https://gist.github.com/iarspider/fdd17a07202e6ff4bb884f3751e37cd5This seems to be a threading issue: running this relVal with single thread doesn't generate an exception, while running with 2+ threads does.
The text was updated successfully, but these errors were encountered: