Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce compile times of distance specializations (#1307)
Following the findings in https://github.com/ahendriksen/raft/tree/investigate-compile-time-reduction-strategies#investigation-of-compile-times, this PR reduces the compile times of the pairwise distance specializations. This is achieved by: 1. Reducing the number of included files in the translation units where kernels are instantiated, specifically `spdlog` and `rmm` are avoided. 2. Limiting loop unrolling in kernels with expensive operations in the inner loop. Additional improvements geared towards iterative development: 1. The tests do not have to be recompiled when the internals of a pairwise distance kernel change. Before, a rebuilt was triggered due an include of `raft/distance/distance.cuh`. 2. Addition of a fine tuning benchmark for the pairwise distance kernels that separates building the kernel from the benchmark code. This dramatically speeds up development. Compiling an empty benchmark takes roughly 18 seconds on my machine. Whereas recompiling a kernel takes ~3.8 seconds. Without this addition, a commit like 35a2ad4 would require substantially more time to make sure that performance is not degraded. ![image](https://user-images.githubusercontent.com/4172822/225383120-5f8a82f9-0b46-4c39-bc1d-7b2a0551e881.png) ``` Parallel build time before: 270 seconds (6 cores, SMT, 12 jobs) Parallel build time before: 147 seconds (6 cores, SMT, 12 jobs) Sum of compile times before: 3022.6 seconds Sum of compile times after: 1816.2 seconds Comparison of compile times between headers and compiled: path before (s) after (s) change (s) change (%) pairwise_test None 0.486 None None ance/distance/specializations/detail/lp_unexpanded_double_double_double_int.cu.o 101.1 10.3 -90.8 -89.8% src/distance/distance/specializations/detail/canberra_float_float_float_int.cu.o 52.9 6.3 -46.6 -88.0% /distance/distance/specializations/detail/canberra_double_double_double_int.cu.o 48.5 6.4 -42.1 -86.8% stance/distance/specializations/detail/jensen_shannon_float_float_float_int.cu.o 65.3 10.4 -55.0 -84.1% istance/distance/specializations/detail/kl_divergence_float_float_float_int.cu.o 70.2 12.6 -57.6 -82.0% stance/distance/specializations/detail/correlation_double_double_double_int.cu.o 46.7 8.9 -37.8 -80.9% distance/specializations/detail/hellinger_expanded_double_double_double_int.cu.o 41.6 8.1 -33.5 -80.6% nce/distance/specializations/detail/jensen_shannon_double_double_double_int.cu.o 74.6 15.1 -59.5 -79.7% ir/src/distance/distance/specializations/detail/l1_double_double_double_int.cu.o 40.9 8.4 -32.5 -79.4% ance/distance/specializations/detail/l2_unexpanded_double_double_double_int.cu.o 40.7 8.6 -32.1 -78.8% distance/specializations/detail/hamming_unexpanded_double_double_double_int.cu.o 40.8 9.0 -31.7 -77.8% istance/distance/specializations/detail/lp_unexpanded_float_float_float_int.cu.o 45.9 10.2 -35.7 -77.8% src/distance/distance/specializations/detail/l_inf_double_double_double_int.cu.o 41.2 9.5 -31.8 -77.0% istance/distance/specializations/detail/russel_rao_double_double_double_int.cu.o 29.5 7.2 -22.3 -75.6% t.dir/src/distance/distance/specializations/detail/l1_float_float_float_int.cu.o 47.3 13.2 -34.1 -72.2% ce/distance/specializations/detail/hamming_unexpanded_float_float_float_int.cu.o 47.0 13.3 -33.7 -71.6% /distance/distance/specializations/detail/correlation_float_float_float_int.cu.o 49.4 14.1 -35.3 -71.5% ce/distance/specializations/detail/hellinger_expanded_float_float_float_int.cu.o 43.6 12.5 -31.1 -71.4% c/distance/distance/specializations/detail/russel_rao_float_float_float_int.cu.o 28.5 8.2 -20.3 -71.2% ance/distance/specializations/detail/kl_divergence_double_double_double_int.cu.o 75.8 21.9 -53.9 -71.1% istance/distance/specializations/detail/l2_unexpanded_float_float_float_int.cu.o 46.2 13.5 -32.7 -70.7% ir/src/distance/distance/specializations/detail/l_inf_float_float_float_int.cu.o 43.1 12.7 -30.4 -70.6% stance/distance/specializations/detail/l2_expanded_double_double_double_int.cu.o 52.3 24.9 -27.3 -52.3% /distance/distance/specializations/detail/l2_expanded_float_float_float_int.cu.o 75.8 40.3 -35.5 -46.8% rc/distance/distance/specializations/detail/cosine_double_double_double_int.cu.o 53.5 28.7 -24.8 -46.4% r/src/distance/distance/specializations/detail/cosine_float_float_float_int.cu.o 83.9 50.1 -33.8 -40.3% CMakeFiles/pairwise_test.dir/test/distance/fused_l2_nn.cu.o 85.1 64.1 -21.1 -24.7% wise_test.dir/src/distance/distance/specializations/fused_l2_nn_float_int64.cu.o 56.2 42.9 -13.3 -23.6% irwise_test.dir/src/distance/distance/specializations/fused_l2_nn_float_int.cu.o 52.5 40.2 -12.3 -23.5% CMakeFiles/pairwise_test.dir/test/distance/dist_lp_unexp.cu.o 56.3 43.3 -13.0 -23.1% CMakeFiles/pairwise_test.dir/test/distance/dist_russell_rao.cu.o 55.7 44.0 -11.7 -21.0% rwise_test.dir/src/distance/distance/specializations/fused_l2_nn_double_int.cu.o 45.3 36.4 -9.0 -19.8% CMakeFiles/pairwise_test.dir/test/distance/dist_l2_unexp.cu.o 54.6 44.1 -10.6 -19.3% CMakeFiles/pairwise_test.dir/test/distance/dist_canberra.cu.o 51.6 42.1 -9.6 -18.6% CMakeFiles/pairwise_test.dir/test/distance/dist_l2_exp.cu.o 53.1 43.4 -9.6 -18.2% CMakeFiles/pairwise_test.dir/test/distance/dist_l_inf.cu.o 53.2 43.9 -9.3 -17.5% CMakeFiles/pairwise_test.dir/test/distance/dist_hellinger.cu.o 53.1 44.0 -9.0 -17.0% CMakeFiles/pairwise_test.dir/test/distance/dist_hamming.cu.o 52.3 43.4 -8.9 -17.0% CMakeFiles/pairwise_test.dir/test/distance/dist_l2_sqrt_exp.cu.o 54.0 45.6 -8.4 -15.6% CMakeFiles/pairwise_test.dir/test/distance/dist_l1.cu.o 52.6 44.5 -8.1 -15.4% CMakeFiles/pairwise_test.dir/test/distance/dist_kl_divergence.cu.o 52.4 44.7 -7.7 -14.8% ise_test.dir/src/distance/distance/specializations/fused_l2_nn_double_int64.cu.o 43.5 37.2 -6.4 -14.7% CMakeFiles/pairwise_test.dir/test/distance/dist_cos.cu.o 52.4 44.8 -7.6 -14.5% CMakeFiles/pairwise_test.dir/test/distance/dist_jensen_shannon.cu.o 53.2 45.7 -7.6 -14.2% CMakeFiles/pairwise_test.dir/test/distance/dist_inner_product.cu.o 51.1 44.8 -6.3 -12.4% istance/distance/specializations/detail/inner_product_float_float_float_int.cu.o 39.5 35.1 -4.5 -11.3% CMakeFiles/pairwise_test.dir/test/distance/dist_correlation.cu.o 51.7 46.8 -4.9 -9.5% ance/distance/specializations/detail/inner_product_double_double_double_int.cu.o 37.1 33.9 -3.1 -8.5% src/distance/distance/specializations/detail/kernels/gram_matrix_base_float.cu.o 45.3 41.7 -3.6 -8.0% rc/distance/distance/specializations/detail/kernels/gram_matrix_base_double.cu.o 42.5 39.6 -2.9 -6.8% stance/distance/specializations/detail/kernels/polynomial_kernel_double_int.cu.o 40.4 38.5 -1.9 -4.8% CMakeFiles/pairwise_test.dir/test/distance/dist_adj.cu.o 123.3 117.8 -5.4 -4.4% CMakeFiles/pairwise_test.dir/test/distance/gram.cu.o 55.3 53.4 -1.9 -3.5% build.ninja 4.0 4.0 +0.0 +0.1% istance/distance/specializations/detail/kernels/polynomial_kernel_float_int.cu.o 45.2 45.6 +0.4 +0.8% .dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_float.cu.o 45.2 46.0 +0.8 +1.7% dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_double.cu.o 39.0 39.8 +0.8 +2.1% CMakeFiles/pairwise_test.dir/src/distance/distance/pairwise_distance.cu.o 39.6 50.1 +10.5 +26.6% ``` Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) URL: #1307
- Loading branch information