Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linking errors with CUDA+VecGeom caused by #847 #856

Merged
merged 2 commits into from
Jul 15, 2023

Conversation

pcanal
Copy link
Contributor

@pcanal pcanal commented Jul 14, 2023

Add explicit link dependency from ${target}_final to ${target_static}. This allows the use of CMAKE_LINK_DEPENDS_NO_SHARED=OFF and still get a new linking of the final library (i.e. executing nvlink) if any of the RDC cuda file have changed (and thus the internal CUDA RDC infrastructure symbol have change ; they are seeded with a 'random' number which change at each compilation).

This allows the use of `CMAKE_LINK_DEPENDS_NO_SHARED=OFF` and still get
a new linking of the final library (i.e. executing `nvlink`) if any of
the RDC cuda file have changed (and thus the internal CUDA RDC infrastructure
symbol have change ; they are seeded with a 'random' number which change at
each compilation).
@pcanal pcanal self-assigned this Jul 14, 2023
@pcanal
Copy link
Contributor Author

pcanal commented Jul 14, 2023

This fixes #852

Copy link
Member

@sethrj sethrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for figuring that out @pcanal .

When @esseivaju verifies that it works for him, I'll merge.

@esseivaju
Copy link
Contributor

If I leave CMAKE_LINK_DEPENDS_NO_SHARED as default (=ON), then I no longer get an error during linking but it moved to symbol lookup at runtime:

/home/esseivaj/bld4/devel/celeritas/build-ndebug/bin/celer-g4: symbol lookup error: /home/esseivaj/bld4/devel/celeritas/build-ndebug/lib64/libaccel_final.so: undefined symbol: __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_227203

@sethrj sethrj added bug Something isn't working core Software engineering infrastructure labels Jul 14, 2023
@sethrj sethrj changed the title Add explicit link dependency from ${target}_final to ${target_static} Fix linking errors with CUDA+VecGeom Jul 14, 2023
@pcanal
Copy link
Contributor Author

pcanal commented Jul 14, 2023

@esseivaju Can you run the following 2 command on your failing build:

find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu'
find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu' | cut -d : -f 1 | xargs ls -l

The first command tells us if there is variability in the actual symbol.
The second tells us what which files are not updated when they should have.

@esseivaju
Copy link
Contributor

[esseivaj@zeus build-ndebug]$ find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu'
.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu' | cut -d : -f 1 | xargs ls -l./lib64/libaccel_final.so:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./lib64/libceleritas_final.so:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./lib64/libceleritas.so:0000000000534ce8 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_249149
./lib64/libceleritas_static.a:EPlusGGModel.cu.o:0000000000000000 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_249149
./CMakeFiles/celeritas_random_CurandPerformance.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_phys_Physics.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_phys_Particle.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/accel_final.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_ext_Vecgeom.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_final.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/demo-rasterizer.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_mat_Material.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/demo-geo-check.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_track_TrackInit.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_249149
./CMakeFiles/demo-interactor.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_geo_Geometry.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./CMakeFiles/celeritas_random_RngEngine.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_240531
./src/celeritas/CMakeFiles/celeritas_objects.dir/em/model/EPlusGGModel.cu.o:0000000000000000 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_7847f3ed_249149
[esseivaj@zeus build-ndebug]$ find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu' | cut -d : -f 1 | xargs ls -l
-rw-rw-r--. 1 esseivaj esseivaj 6775960 Jul 14 09:03 ./CMakeFiles/accel_final.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6796152 Jul 14 09:03 ./CMakeFiles/celeritas_ext_Vecgeom.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6774512 Jul 14 09:03 ./CMakeFiles/celeritas_final.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6807664 Jul 14 09:03 ./CMakeFiles/celeritas_geo_Geometry.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6779832 Jul 14 09:03 ./CMakeFiles/celeritas_mat_Material.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6779784 Jul 14 09:03 ./CMakeFiles/celeritas_phys_Particle.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6782224 Jul 14 09:03 ./CMakeFiles/celeritas_phys_Physics.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6936632 Jul 14 09:03 ./CMakeFiles/celeritas_random_CurandPerformance.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6778528 Jul 14 09:03 ./CMakeFiles/celeritas_random_RngEngine.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6777032 Jul 14 09:06 ./CMakeFiles/celeritas_track_TrackInit.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6795824 Jul 14 09:03 ./CMakeFiles/demo-geo-check.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6811896 Jul 14 09:03 ./CMakeFiles/demo-interactor.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6801016 Jul 14 09:03 ./CMakeFiles/demo-rasterizer.dir/cmake_device_link.o
-rwxrwxr-x. 1 esseivaj esseivaj 6764128 Jul 14 09:03 ./lib64/libaccel_final.so
-rwxrwxr-x. 1 esseivaj esseivaj 6763776 Jul 14 09:03 ./lib64/libceleritas_final.so
-rwxrwxr-x. 1 esseivaj esseivaj 6208720 Jul 14 09:06 ./lib64/libceleritas.so
-rw-rw-r--. 1 esseivaj esseivaj 9044644 Jul 14 09:06 ./lib64/libceleritas_static.a
-rw-rw-r--. 1 esseivaj esseivaj   63104 Jul 14 09:06 ./src/celeritas/CMakeFiles/celeritas_objects.dir/em/model/EPlusGGModel.cu.o

@pcanal
Copy link
Contributor Author

pcanal commented Jul 14, 2023

-rwxrwxr-x. 1 esseivaj esseivaj 6763776 Jul 14 09:03 ./lib64/libceleritas_final.so
-rw-rw-r--. 1 esseivaj esseivaj 9044644 Jul 14 09:06 ./lib64/libceleritas_static.a

is odd. It should not longer be the case with the patch. (i.e. libceleritas_final should have been linked right after libceleritas_static.

Try

touch  where_celeritas_source_is/src/celeritas/em/model/EPlusGGModel.cu

and rebuild and redo the above test and try running the failing example.

@esseivaju
Copy link
Contributor

yeah, libceleritas_final does get linked just after libceleritas_static. I switched to develop and reran a build so the timestamp got messed up. I redid a build from zero on your branch and then trigger an incremental build but still get similar errors at runtime as mentioned previously:

[esseivaj@zeus build-ndebug]$ touch  ../src/celeritas/em/model/EPlusGGModel.cu
[esseivaj@zeus build-ndebug]$ ninja
[5/5] Linking CUDA shared library lib64/libceleritas_final.so
[esseivaj@zeus build-ndebug]$ ctest --rerun-failed
Test project /home/esseivaj/bld4/devel/celeritas/build-ndebug
      Start  74: celeritas/ext/Vecgeom:FourLevelsTest.*
 1/18 Test  #74: celeritas/ext/Vecgeom:FourLevelsTest.* .........***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.12 sec
      Start  75: celeritas/ext/Vecgeom:SolidsTest.*
 2/18 Test  #75: celeritas/ext/Vecgeom:SolidsTest.* .............***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.11 sec
      Start  76: celeritas/ext/Vecgeom:FourLevelsGeantTest.*
 3/18 Test  #76: celeritas/ext/Vecgeom:FourLevelsGeantTest.* ....***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.08 sec
      Start  77: celeritas/ext/Vecgeom:SolidsGeantTest.*
 4/18 Test  #77: celeritas/ext/Vecgeom:SolidsGeantTest.* ........***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.06 sec
      Start  93: celeritas/geo/Geometry
 5/18 Test  #93: celeritas/geo/Geometry .........................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 110: celeritas/global/Stepper:TestEm3MscNofluct.*
 6/18 Test #110: celeritas/global/Stepper:TestEm3MscNofluct.* ...***Failed  Error regular expression found in output. Regex=[tests FAILED]  0.69 sec
      Start 126: celeritas/mat/Material
 7/18 Test #126: celeritas/mat/Material .........................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.12 sec
      Start 128: celeritas/phys/Particle
 8/18 Test #128: celeritas/phys/Particle ........................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.12 sec
      Start 129: celeritas/phys/Physics
 9/18 Test #129: celeritas/phys/Physics .........................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.06 sec
      Start 133: celeritas/random/RngEngine
10/18 Test #133: celeritas/random/RngEngine .....................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 146: celeritas/random/CurandPerformance
11/18 Test #146: celeritas/random/CurandPerformance .............***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 154: accel/ExceptionConverter
12/18 Test #154: accel/ExceptionConverter .......................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 155: accel/HitManager
13/18 Test #155: accel/HitManager ...............................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 156: accel/HitProcessor
14/18 Test #156: accel/HitProcessor .............................***Failed  Required regular expression not found. Regex=[tests PASSED
]  0.05 sec
      Start 161: app/celer-g4
15/18 Test #161: app/celer-g4 ...................................***Failed    0.04 sec
      Start 163: app/demo-interactor
16/18 Test #163: app/demo-interactor ............................***Failed    0.16 sec
      Start 164: app/demo-geo-check
17/18 Test #164: app/demo-geo-check .............................***Failed    0.11 sec
      Start 165: app/demo-rasterizer
18/18 Test #165: app/demo-rasterizer ............................***Failed    0.19 sec

0% tests passed, 18 tests failed out of 18

Label Time Summary:
app           =   0.49 sec*proc (4 tests)
gpu           =   2.01 sec*proc (15 tests)
nomemcheck    =   0.35 sec*proc (2 tests)
unit          =   1.67 sec*proc (14 tests)

Total Test time (real) =   2.19 sec

The following tests FAILED:
         74 - celeritas/ext/Vecgeom:FourLevelsTest.* (Failed)
         75 - celeritas/ext/Vecgeom:SolidsTest.* (Failed)
         76 - celeritas/ext/Vecgeom:FourLevelsGeantTest.* (Failed)
         77 - celeritas/ext/Vecgeom:SolidsGeantTest.* (Failed)
         93 - celeritas/geo/Geometry (Failed)
        110 - celeritas/global/Stepper:TestEm3MscNofluct.* (Failed)
        126 - celeritas/mat/Material (Failed)
        128 - celeritas/phys/Particle (Failed)
        129 - celeritas/phys/Physics (Failed)
        133 - celeritas/random/RngEngine (Failed)
        146 - celeritas/random/CurandPerformance (Failed)
        154 - accel/ExceptionConverter (Failed)
        155 - accel/HitManager (Failed)
        156 - accel/HitProcessor (Failed)
        161 - app/celer-g4 (Failed)
        163 - app/demo-interactor (Failed)
        164 - app/demo-geo-check (Failed)
        165 - app/demo-rasterizer (Failed)
Errors while running CTest
Output from these tests are in: /home/esseivaj/bld4/devel/celeritas/build-ndebug/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
[esseivaj@zeus build-ndebug]$ find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu'
./lib64/libaccel_final.so:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./lib64/libceleritas_final.so:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_5df7e3fc_295558
./lib64/libceleritas.so:000000000053dd88 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_5df7e3fc_295558
./lib64/libceleritas_static.a:EPlusGGModel.cu.o:0000000000000000 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_5df7e3fc_295558
./CMakeFiles/celeritas_random_CurandPerformance.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_phys_Physics.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_phys_Particle.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/accel_final.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_ext_Vecgeom.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_final.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_5df7e3fc_295558
./CMakeFiles/demo-rasterizer.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_mat_Material.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/demo-geo-check.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_track_TrackInit.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_291482
./CMakeFiles/demo-interactor.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_geo_Geometry.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./CMakeFiles/celeritas_random_RngEngine.dir/cmake_device_link.o:                 U __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_9d44abf7_285471
./src/celeritas/CMakeFiles/celeritas_objects.dir/em/model/EPlusGGModel.cu.o:0000000000000000 D __fatbinwrap_85e8a02e_15_EPlusGGModel_cu_5df7e3fc_295558
[esseivaj@zeus build-ndebug]$ find . \( -name \*.o -o -name \*.a -o -name \*.so \) -a -exec nm -A {} \; | grep '__fatbinwrap_.*_15_EPlusGGModel_cu' | cut -d : -f 1 | xargs ls -l
-rw-rw-r--. 1 esseivaj esseivaj 6775984 Jul 14 11:26 ./CMakeFiles/accel_final.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6796176 Jul 14 11:27 ./CMakeFiles/celeritas_ext_Vecgeom.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6774528 Jul 14 11:34 ./CMakeFiles/celeritas_final.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6807688 Jul 14 11:27 ./CMakeFiles/celeritas_geo_Geometry.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6779856 Jul 14 11:27 ./CMakeFiles/celeritas_mat_Material.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6779808 Jul 14 11:27 ./CMakeFiles/celeritas_phys_Particle.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6782240 Jul 14 11:27 ./CMakeFiles/celeritas_phys_Physics.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6936672 Jul 14 11:27 ./CMakeFiles/celeritas_random_CurandPerformance.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6778552 Jul 14 11:27 ./CMakeFiles/celeritas_random_RngEngine.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6777064 Jul 14 11:29 ./CMakeFiles/celeritas_track_TrackInit.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6795856 Jul 14 11:28 ./CMakeFiles/demo-geo-check.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6811920 Jul 14 11:28 ./CMakeFiles/demo-interactor.dir/cmake_device_link.o
-rw-rw-r--. 1 esseivaj esseivaj 6801048 Jul 14 11:28 ./CMakeFiles/demo-rasterizer.dir/cmake_device_link.o
-rwxrwxr-x. 1 esseivaj esseivaj 6764128 Jul 14 11:26 ./lib64/libaccel_final.so
-rwxrwxr-x. 1 esseivaj esseivaj 6763776 Jul 14 11:34 ./lib64/libceleritas_final.so
-rwxrwxr-x. 1 esseivaj esseivaj 6255592 Jul 14 11:34 ./lib64/libceleritas.so
-rw-rw-r--. 1 esseivaj esseivaj 9044644 Jul 14 11:34 ./lib64/libceleritas_static.a
-rw-rw-r--. 1 esseivaj esseivaj   63104 Jul 14 11:34 ./src/celeritas/CMakeFiles/celeritas_objects.dir/em/model/EPlusGGModel.cu.o

@pcanal
Copy link
Contributor Author

pcanal commented Jul 14, 2023

Alright. Better but not there yet :(. (and I can reproduce the problem)

@pcanal
Copy link
Contributor Author

pcanal commented Jul 14, 2023

I pushed an additional commit that solves the latest issue (as far as I can tell). @esseivaju Can you confirm?

@esseivaju
Copy link
Contributor

fixed it for me as well!

@pcanal pcanal force-pushed the FinalStaticDeps branch from 133d6ed to b6728fa Compare July 14, 2023 20:02
@sethrj sethrj linked an issue Jul 15, 2023 that may be closed by this pull request
@sethrj sethrj merged commit a24078c into celeritas-project:develop Jul 15, 2023
@sethrj sethrj changed the title Fix linking errors with CUDA+VecGeom Fix linking errors with CUDA+VecGeom caused by #847 Jul 15, 2023
@sethrj
Copy link
Member

sethrj commented Jul 17, 2023

I think this will also fix a similar error I've seen on Summit (static only builds): every so often on a rebuild I'll see errors like:

lib64/libceleritas.a(RelativisticBremModel.cu.o): In function `__sti____cudaRegisterAll()':
tmpxft_003fb530_00000000-6_RelativisticBremModel.cudafe1.cpp:(.text.startup+0x34): undefined reference to `__cudaRegisterLinkedBinary_56_tmpxft_003fb530_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12'

where celeritas and celeritas_final have different hashes:

$ nm lib64/libceleritas.a  | grep 'RegisterLinked.*RelativisticBremModel' | cu++filt
                 U __cudaRegisterLinkedBinary_56_tmpxft_003fb530_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12
$ nm lib64/libceleritas_final.a  | grep 'RegisterLinked.*RelativisticBremModel' | cu++filt
0000000000000748 b __cudaRegisterLinkedBinary_56_tmpxft_001f0634_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12::__p
0000000000000d60 T __cudaRegisterLinkedBinary_56_tmpxft_001f0634_00000000_7_RelativisticBremModel_cpp1_ii_597f8d12

so I guess it stemmed from the _final lib not being relinked:

-rw-rw-r--  1 s3j csc404  11M Jul 17 15:58 libceleritas.a
-rw-rw-r--  1 s3j csc404 6.7M Jun 14 08:51 libceleritas_final.a

@pcanal
Copy link
Contributor Author

pcanal commented Jul 17, 2023

I think this will also fix a similar error ....

It looks like it. If you ever see this again, please let me know.

@sethrj
Copy link
Member

sethrj commented Aug 4, 2023

It looks like it. If you ever see this again, please let me know.

@pcanal I see this on summit (static libs) when not using vecgeom:

$ ninja bin/celer-sim
...
lib64/libceleritas.a(AlongStepUniformMscAction.cu.o): In function `__sti____cudaRegisterAll()':
tmpxft_00358c8a_00000000-6_AlongStepUniformMscAction.cudafe1.cpp:(.text.startup+0x34): undefined reference to `__cudaRegisterLinkedBinary_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x0): undefined reference to `__fatbinwrap_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x8): undefined reference to `__fatbinwrap_63_tmpxft_00347192_00000000_7_AlongStepRZMapFieldMscAction_cpp1_ii_597f8d12'
/usr/bin/ld: link errors found, deleting executable `bin/celer-sim'

timestamps:

$ ls -al
-rw-rw-r--  1 s3j csc404  11M Aug  4 11:56 libceleritas.a
-rw-rw-r--  1 s3j csc404 6.7M Aug  4 09:25 libceleritas_final.a

symbols:

$ nm libceleritas.a | grep AlongStepUniformMscAction_cpp
0000000000000010 t _ZL327__device_stub__ZN9celeritas6detail84_GLOBAL__N__60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d1218launch_action_implINS_24ConditionalTrackExecutorINS0_22IsAlongStepActionEqualENS0_18PropagationApplierINS0_29UniformFieldPropagatorFactoryEvEEEES7_Lb1ELi256ELi1EEEvNS_5RangeINS_8OpaqueIdINS_6ThreadEjEEEET_RKN9celeritas5RangeINS_8OpaqueIdINS_6ThreadEjEEEERNS_24ConditionalTrackExecutorINS_6detail22IsAlongStepActionEqualENS8_18PropagationApplierINS8_29UniformFieldPropagatorFactoryEvEEEE
0000000000000000 W _ZN9celeritas6detail84_GLOBAL__N__60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d1218launch_action_implINS_24ConditionalTrackExecutorINS0_22IsAlongStepActionEqualENS0_18PropagationApplierINS0_29UniformFieldPropagatorFactoryEvEEEES7_Lb1ELi256ELi1EEEvNS_5RangeINS_8OpaqueIdINS_6ThreadEjEEEET_
                 U __cudaRegisterLinkedBinary_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000000 D __fatbinwrap_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
$ nm libceleritas_final.a | grep AlongStepUniformMscAction_cpp
0000000000000422 r _ZL90def_module_id_str_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000770 b _ZZ99__cudaRegisterLinkedBinary_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12E3__p
00000000000013f0 T __cudaRegisterLinkedBinary_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
                 U __fatbinwrap_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12

Even removing all the libceler* and re-ninjaing doesn't help:

tmpxft_00358c8a_00000000-6_AlongStepUniformMscAction.cudafe1.cpp:(.text.startup+0x34): undefined reference to `__cudaRegisterLinkedBinary_60_tmpxft_00358c8a_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x0): undefined reference to `__fatbinwrap_60_tmpxft_00347139_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12'
lib64/libceleritas_final.a(cmake_device_link.o):(.toc+0x8): undefined reference to `__fatbinwrap_63_tmpxft_00347192_00000000_7_AlongStepRZMapFieldMscAction_cpp1_ii_597f8d12'

but removing all the *.cu.o and *.a files in the build directory it works:

$ nm libceleritas* | grep '__cudaReg.*AlongStepUniformMscAction_cpp'
                 U __cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12
0000000000000770 b _ZZ99__cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12E3__p
00000000000013f0 T __cudaRegisterLinkedBinary_60_tmpxft_0035a4e0_00000000_7_AlongStepUniformMscAction_cpp1_ii_597f8d12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Software engineering infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA link issue with incremental build
3 participants