Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Profiling, main branch (2024.09.22.) #713

Merged
merged 3 commits into from
Oct 22, 2024

Conversation

krasznaa
Copy link
Member

As we've been discussing between a few of us, the build of the project is getting a bit out of hand by now. 😦 Building generally takes a long time, and also a surprisingly large amount of memory.

To help with this, first we need to understand exactly which steps of the build take the longest time and the largest amount of memory. To do this, I hijacked the CTEST_USE_LAUNCHERS feature of ctest. Which is the technique we use also in AtlasCMake for saving "package specific" build logs in ATLAS offline builds.

What happens is that if one specifies -DCTEST_USE_LAUNCHERS=TRUE in the CMake configuration command, the newly introduced traccc-ctest.sh script gets set up to intercept each and every build command. Including all linking, and any other technical commands. The script then runs the commands through GNU time, to get detailed statistics about every build command. It saves the the results of this into a file called traccc_build_performance.log in the build directory. Which would have entries like:

        Command being timed: "/home/krasznaa/software/kitware/cmake-3.30.2/x86_64-ubuntu2204-gcc11-opt/bin/ctest --launch --target-name Vc --build-dir /data/ssd-1tb/projects/traccc/build/_deps/vc-build --output /data/ssd-1tb/projects/traccc/build/_deps/vc-build/trigonometric_SSSE3.cpp -- /home/krasznaa/software/kitware/cmake-3.30.2/x86_64-ubuntu2204-gcc11-opt/bin/cmake -E copy src/trigonometric.cpp /data/ssd-1tb/projects/traccc/build/_deps/vc-build/trigonometric_SSSE3.cpp"
        User time (seconds): 0.00
        System time (seconds): 0.01
        Percent of CPU this job got: 90%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 9728
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 963
        Voluntary context switches: 6
        Involuntary context switches: 4
        Swaps: 0
        File system inputs: 0
        File system outputs: 48
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Leading up to some of the really "heavy" commands, like:

        Command being timed: "/home/krasznaa/software/kitware/cmake-3.30.2/x86_64-ubuntu2204-gcc11-opt/bin/ctest --launch --target-name traccc_test_cuda --build-dir /data/ssd-1tb/projects/traccc/build/tests/cuda --output CMakeFiles/traccc_test_cuda.dir/test_ckf_combinatorics_telescope.cpp.o --source /data/ssd-1tb/projects/traccc/traccc/tests/cuda/test_ckf_combinatorics_telescope.cpp --language CXX -- /usr/bin/c++ -DACTS_CONCEPTS_SUPPORTED -DALGEBRA_PLUGINS_INCLUDE_ARRAY -DBOOST_ALL_NO_LIB -DCOVFIE_QUIET -DDETRAY_ALGEBRA_ARRAY -DDETRAY_ALGEBRA_EIGEN -DDETRAY_ALGEBRA_VC -DDETRAY_CUSTOM_SCALARTYPE=float -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTRACCC_CUSTOM_SCALARTYPE=float -DVECMEM_DEBUG_MSG_LVL=0 -DVECMEM_HAVE_PMR_MEMORY_RESOURCE -DVECMEM_SOURCE_DIR_LENGTH=37 -DVECMEM_SUPPORT_POSIX_ATOMIC_REF -I/data/ssd-1tb/projects/traccc/build/_deps/cccl-src/thrust/thrust/cmake/../.. -I/data/ssd-1tb/projects/traccc/build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/data/ssd-1tb/projects/traccc/build/_deps/cccl-src/cub/cub/cmake/../.. -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/cuda/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/cuda/include -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-build/core/CMakeFiles -I/data/ssd-1tb/projects/traccc/build/_deps/vecmem-src/core/include -I/data/ssd-1tb/projects/traccc/build/_deps/dfelibs-src -I/data/ssd-1tb/projects/traccc/traccc/core/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/array_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/common/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/array/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/common/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vecmem/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vecmem_cmath/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vecmem/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/eigen_eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/eigen/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/eigen/include -I/data/ssd-1tb/projects/traccc/traccc/plugins/algebra/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/storage/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/math/vc/include -I/data/ssd-1tb/projects/traccc/build/_deps/algebraplugins-src/frontend/vc_cmath/include -I/data/ssd-1tb/projects/traccc/traccc/device/common/include -I/data/ssd-1tb/projects/traccc/traccc/device/cuda/include -I/data/ssd-1tb/projects/traccc/traccc/performance/include -I/data/ssd-1tb/projects/traccc/traccc/io/include -I/data/ssd-1tb/projects/traccc/traccc/simulation/include -I/data/ssd-1tb/projects/traccc/traccc/tests/common -isystem /home/krasznaa/software/nvidia/cuda-12.6.1/x86_64/targets/x86_64-linux/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/googletest-src/googletest/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/googletest-src/googletest -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/core/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-build/core/CMakeFiles -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/io/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/nlohmann_json-src/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/covfie-src/lib/core -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/tests/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/detectors/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/svgtools/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/actsvg-src/core/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/actsvg-src/meta/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/eigen3-src -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/array/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/eigen/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/vc-src -isystem /data/ssd-1tb/projects/traccc/build/_deps/detray-src/plugins/algebra/vc/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/acts-src/Core/include -isystem /data/ssd-1tb/projects/traccc/build/_deps/acts-build/Core -Wall -Wextra -Wshadow -Wunused-local-typedefs -pedantic -Wold-style-cast -Wfloat-conversion -Werror -O3 -g -std=c++20 -MD -MT tests/cuda/CMakeFiles/traccc_test_cuda.dir/test_ckf_combinatorics_telescope.cpp.o -MF CMakeFiles/traccc_test_cuda.dir/test_ckf_combinatorics_telescope.cpp.o.d -o CMakeFiles/traccc_test_cuda.dir/test_ckf_combinatorics_telescope.cpp.o -c /data/ssd-1tb/projects/traccc/traccc/tests/cuda/test_ckf_combinatorics_telescope.cpp"
        User time (seconds): 51.10
        System time (seconds): 3.26
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:54.37
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2780640
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 973157
        Voluntary context switches: 11
        Involuntary context switches: 97
        Swaps: 0
        File system inputs: 0
        File system outputs: 478944
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

This PR just writes such a log file, we will still need to write a bit of scripting to extract useful information out of these log files. I imagine that it would be relatively doable to produce let's say a CSV file from this log file with the info we're interested in, and then we could use even something as simple as a spreadsheet application to understand where we need to concentrate our efforts. 🤔

@krasznaa krasznaa added the build This relates to the build system label Sep 22, 2024
@krasznaa krasznaa force-pushed the BuildProfiling-main-20240922 branch from e6f5acb to de1473c Compare September 22, 2024 10:09
@paulgessinger
Copy link
Member

@krasznaa just to point it out again: I literally wrote a tool that does this exact thing for you given a compilation database. It does structured printout and gives you nice csv.

I would encourage you to at least give it a try before reimplementing it.

@krasznaa
Copy link
Member Author

I did have a look Paul. But that one still doesn't do any linking steps, does it? And unfortunately our build can be expensive during linking as well.

As you can see, this PR is literally just a couple of lines long. So collecting the information is really not the complicated part here. But if we can re-use your code's logic for organizing the information, that would be very useful indeed.

When using CTEST_USE_LAUNCHERS=TRUE, the build now sends each
command through a specific script, which would profile those
commands using the "time" executable. Saving all collected
output into a file called "traccc_build_performance.log" in
the build directory.
@krasznaa krasznaa force-pushed the BuildProfiling-main-20240922 branch from de1473c to 1d6f7b8 Compare October 22, 2024 07:45
Co-authored-by: Stephen Nicholas Swatman <stephenswat@gmail.com>
@stephenswat stephenswat enabled auto-merge (squash) October 22, 2024 08:05
@stephenswat stephenswat disabled auto-merge October 22, 2024 08:11
@stephenswat stephenswat enabled auto-merge (squash) October 22, 2024 08:46
Also added a warning in case somebody tries to use the build profiling
on something other than Linux or macOS.
@krasznaa krasznaa force-pushed the BuildProfiling-main-20240922 branch from 1c9f666 to 1ac8524 Compare October 22, 2024 08:49
Copy link

@stephenswat stephenswat merged commit 27fe150 into acts-project:main Oct 22, 2024
23 checks passed
@krasznaa krasznaa deleted the BuildProfiling-main-20240922 branch October 22, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build This relates to the build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants