-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix linking errors with CUDA+VecGeom caused by #847 #856
Conversation
This allows the use of `CMAKE_LINK_DEPENDS_NO_SHARED=OFF` and still get a new linking of the final library (i.e. executing `nvlink`) if any of the RDC cuda file have changed (and thus the internal CUDA RDC infrastructure symbol have change ; they are seeded with a 'random' number which change at each compilation).
This fixes #852 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for figuring that out @pcanal .
When @esseivaju verifies that it works for him, I'll merge.
If I leave
|
@esseivaju Can you run the following 2 command on your failing build:
The first command tells us if there is variability in the actual symbol. |
|
is odd. It should not longer be the case with the patch. (i.e. Try
and rebuild and redo the above test and try running the failing example. |
yeah,
|
Alright. Better but not there yet :(. (and I can reproduce the problem) |
I pushed an additional commit that solves the latest issue (as far as I can tell). @esseivaju Can you confirm? |
fixed it for me as well! |
I think this will also fix a similar error I've seen on Summit (static only builds): every so often on a rebuild I'll see errors like:
where $ nm lib64/libceleritas.a | grep 'RegisterLinked.*RelativisticBremModel' | cu++filt
U __cudaRegisterLinkedBinary_56_tmpxft_003fb530_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12
$ nm lib64/libceleritas_final.a | grep 'RegisterLinked.*RelativisticBremModel' | cu++filt
0000000000000748 b __cudaRegisterLinkedBinary_56_tmpxft_001f0634_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12::__p
0000000000000d60 T __cudaRegisterLinkedBinary_56_tmpxft_001f0634_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12 so I guess it stemmed from the
|
It looks like it. If you ever see this again, please let me know. |
@pcanal I see this on summit (static libs) when not using vecgeom: $ ninja bin/celer-sim
...
lib64/libceleritas.a(AlongStepUniformMscAction.cu.o): In function `__sti____cudaRegisterAll()':
tmpxft_00358c8a_00000000-6_AlongStepUniformMscAction.cudafe1.cpp:(.text.startup+0x34): undefined reference to `__cudaRegisterLinkedBinary_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x0): undefined reference to `__fatbinwrap_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x8): undefined reference to `__fatbinwrap_63_tmpxft_00347192_00000000_7_AlongStepRZMapFieldMscAction_cpp1_ii_597f8d12'
/usr/bin/ld: link errors found, deleting executable `bin/celer-sim' timestamps: $ ls -al
-rw-rw-r-- 1 s3j csc404 11M Aug 4 11:56 libceleritas.a
-rw-rw-r-- 1 s3j csc404 6.7M Aug 4 09:25 libceleritas_final.a symbols: $ nm libceleritas.a | grep AlongStepUniformMscAction_cpp
0000000000000010 t _ZL327__device_stub__ZN9celeritas6detail84_GLOBAL__N__60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d1218launch_action_implINS_24ConditionalTrackExecutorINS0_22IsAlongStepActionEqualENS0_18PropagationApplierINS0_29UniformFieldPropagatorFactoryEvEEEES7_Lb1ELi256ELi1EEEvNS_5RangeINS_8OpaqueIdINS_6ThreadEjEEEET_RKN9celeritas5RangeINS_8OpaqueIdINS_6ThreadEjEEEERNS_24ConditionalTrackExecutorINS_6detail22IsAlongStepActionEqualENS8_18PropagationApplierINS8_29UniformFieldPropagatorFactoryEvEEEE
0000000000000000 W _ZN9celeritas6detail84_GLOBAL__N__60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d1218launch_action_implINS_24ConditionalTrackExecutorINS0_22IsAlongStepActionEqualENS0_18PropagationApplierINS0_29UniformFieldPropagatorFactoryEvEEEES7_Lb1ELi256ELi1EEEvNS_5RangeINS_8OpaqueIdINS_6ThreadEjEEEET_
U __cudaRegisterLinkedBinary_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000000 D __fatbinwrap_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
$ nm libceleritas_final.a | grep AlongStepUniformMscAction_cpp
0000000000000422 r _ZL90def_module_id_str_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000770 b _ZZ99__cudaRegisterLinkedBinary_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12E3__p
00000000000013f0 T __cudaRegisterLinkedBinary_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
U __fatbinwrap_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12 Even removing all the
but removing all the $ nm libceleritas* | grep '__cudaReg.*AlongStepUniformMscAction_cpp'
U __cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000770 b _ZZ99__cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12E3__p
00000000000013f0 T __cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12 |
Add explicit link dependency from
${target}_final
to${target_static}
. This allows the use ofCMAKE_LINK_DEPENDS_NO_SHARED=OFF
and still get a new linking of the final library (i.e. executingnvlink
) if any of the RDC cuda file have changed (and thus the internal CUDA RDC infrastructure symbol have change ; they are seeded with a 'random' number which change at each compilation).