Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove specializations and split expensive headers #1415

Closed
wants to merge 91 commits into from

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Apr 14, 2023

This PR:

  • Reduces single arch (SM70) build time of libraft.so from 19:30 to 13:00 minutes.
  • Reduces single arch (SM70) binary size of libraft.so from 232 MB to 163 MB and all-arch from 617MB to 428MB.
  • Reduces single arch (SM70) rebuild time of libraft.so from 18:54 to 2:28 minutes and number of objects rebuilt from 124 to 32 objects after modification of a single file (touch cpp/include/raft/distance/detail/pairwise_matrix/kernel_sm60.cuh)

This is achieved by following the strategy outlined in issue #1416.

In addition, the following changes have been made:

  • Isolate raft/logger.hpp so that spdlog is not included during the compilation of most translation units.
  • Isolate raft::get_pool_memory_resource in anticipation of Move RMM_LOGGING_ASSERT into separate header rmm#1241 to achieve an additional 10s reduction in compilation time per translation unit.
  • Reduce the veclen instances of the ivf flat scan_interleaved_kernel so that it only takes the maximum value for a datatype or 1. This halves the number of instantiations and should not reduce performance for sophisticated users (who already use powers of two for dataset dimensions). [Discussed with @tfeher]
  • Added ALL_BENCH and ALL_TESTS CMake targets.

Breaking change:

  • Removal of raft::distance::MinAndDistanceReduceOp and raft::distance::KVPMinReduce from fused_l2_nn.cuh. They were not used externally (by cudf or cuml for instance), but were used in some tests and benchmarks. References in bench and tests have been replaced with raft::distance::detail::* variants.

@tfeher
Copy link
Contributor

tfeher commented Apr 17, 2023

Thanks @ahendriksen for this PR! I had only a high level view so far. The savings in compile time and binary size are really great, and having an error message of an accidental instantiation is significant improvement!

@ahendriksen ahendriksen added the breaking Breaking change label Apr 17, 2023
@ahendriksen
Copy link
Contributor Author

Thanks @tfeher and apologies for the large PR.

I have tagged this PR as breaking. Officially, this is right, because of the removal of two symbols from fused_l2_nn.cuh. These symbols are not used by any downstream projects though (as confirmed with GitHub code search). Unofficially, this PR breaks downstream because the set of includes for many headers has shrunk. I am keeping track of downstream effects here:

So far, this has surfaced some missing includes. I intend to fix these issues upstream as soon as possible.

@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Allard, started to look into the details, here is my first batch of comments

cpp/include/raft/neighbors/ivf_flat-inl.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/ivf_flat-inl.cuh Outdated Show resolved Hide resolved
*
* @return the constructed ivf-flat index
*/
template <typename T, typename IdxT>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have changed the the template parameters to be uniformly T and IdxT (earlier we had both value_t, idx_t and T, IdxT pairs). Is that the preferred naming convention @cjnolet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly did this for consistency. I am fine either way. The T, IdxT convention would be consistent with the naming convention used in the distance APIs.

cpp/include/raft/neighbors/ivf_flat-ext.cuh Outdated Show resolved Hide resolved
cpp/src/distance/detail/kernels/gram_matrix_base_double.cu Outdated Show resolved Hide resolved
cpp/src/distance/detail/pairwise_matrix/dispatch.cu Outdated Show resolved Hide resolved
cpp/src/distance/distance.cu Outdated Show resolved Hide resolved
@ahendriksen
Copy link
Contributor Author

ahendriksen commented Apr 17, 2023

The downstream PRs are building alright now. I have filed two PRs to fix missing includes (one for cuml and one for cugraph).

Some of the test runs break due to runtime linking errors. In the case of cugraph, I believe that this does not indicate a bug. In the case of cuML, I am not sure what is happening.

rapidsai/cugraph#3490: The tests for cuda 11.2 and 11.4 are failing, but this seems to be due to them not picking up the libraft.so build from this PR. The tests for cuda 11.8 are passing, which is expected.

rapidsai/cuml#5363: Reverse situation. Tests for 11.2 and 11.4 are succeeding, but the tests for 11.8 are failing. The test for CUDA 11.8 appears to pick up libraft.so from this PR and libcuml from the integration PR, but the test fail due to missing symbol that has been removed in this PR (raft::distance::pairwise_matrix::instantiation_point). I am not sure what is going on. This is the error:

/opt/conda/envs/test/bin/gtests/libcuml/SG_RIDGE_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libcuml/../../../lib/libcuml++.so: undefined symbol: 
_ZN4raft8distance6detail35pairwise_matrix_instantiation_pointINS1_3ops14l1_distance_opIffiEEiffNS_11identity_opEEEvT_NS1_22pairwise_matrix_paramsIT0_T1_T2_T3_EEP11CUstream_st

@cjnolet , @robertmaynard : do you have any suggestions what could cause a mismatch of build headers and binaries in the cuML CI?

EDIT: I think I found the issue. Due to a typo, libcuml was built twice: once with RAFT from this PR and once with the 23.06 branch.

rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Apr 18, 2023
Some files use `thrust::nullopt_t` but not do not include `thrust/optional.h` . This PR fixes that.

The missing include was surfaced by #3490

This PR is necessary to prevent breakage when rapidsai/raft#1415 is merged.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #3493
benfred and others added 6 commits April 18, 2023 14:25
Correlation and Cosine distance both return (1 - similarity) in the pairwise distances apis, meaning that is_min_close is returning the wrong sort order for them. Fix.

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1419
Add public functions for reading and writing into individual ivf-pq lists (clusters), in the input space (reconstructed data) and in flat PQ codes.

Partially solves (IVF-PQ) rapidsai#1205

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1298
@ahendriksen
Copy link
Contributor Author

ahendriksen commented Apr 18, 2023

Status update:

Compiling test/distance/gram.cu would fail otherwise, as it did not
explicitly instantiate the raft::distance::distance.
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Apr 18, 2023
Some files use `rmm::mr::get_current_device_resource()` but not do not include `rmm/mr/device/per_device_resource.hpp` . This PR fixes that.

The missing include was surfaced by #5363. 

This PR is necessary to prevent breakage when rapidsai/raft#1415 is merged.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #5369
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Allard, I went through all the changes. On one hand this looks great: the bulk of the changes are just trivial refactoring, and it is great to have everything in one place to see the combined benefit on compilation time, and to test its effects downstream.

On the other hand, the size of the PR makes it tedious even to scroll through the files. That is not an issue in itself, but I fear that might slow down its acceptance. I think the points in the "additional changes" in the PR description would deserve their own PR, to have a focused discussion. Additionally, if we would need to revert any of the changes, then we could do that more targeted. Since I have already went through the PR, I do not insist on separating it, but it could be still beneficial to do so.

cpp/include/raft/core/logger-ext.hpp Show resolved Hide resolved
cpp/include/raft/distance/distance-ext.cuh Show resolved Hide resolved
cpp/include/raft/distance/fused_l2_nn-inl.cuh Outdated Show resolved Hide resolved
cpp/include/raft/distance/fused_l2_nn-inl.cuh Outdated Show resolved Hide resolved
epilogue_op distance_epilogue);

instantiate_raft_neighbors_brute_force_knn(
int64_t, float, uint32_t, raft::row_major, raft::row_major, raft::identity_op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question as for the distance prims: does it make sense to have instantiations for both uint32_t and int64_t as matrix_idx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense. The instantiations reflect the current state of things. We need these to make sure libraft.so compiles and the tests compile. I could create an issue to look into the instantiations of brute_force_knn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to create an issue to track these things. Indeed, we want to keep the current state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed issue #1449

// XXX: the uint32_t instance is not compiled in libraft.so. So we allow
// instantiating the template here.
//
// TODO: consider removing this test or consider adding an instantiation to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the benefit of testing for additional idx type, I do not like the fact that this test does not use precompiled insatnces. This leads to either slower development workflow or forces the user to comment out this test for quicker iteration (e.g. while adding / modifying test).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed issue #1448

@tfeher
Copy link
Contributor

tfeher commented Apr 19, 2023

For new developers, it might be not obvious when do we use the header.cuh / header-inl.cuh / header-ext.cuh triplet instead of just header.cuh. It would be good to add a section to the developer guide (around https://github.com/rapidsai/raft/blob/branch-23.06/docs/source/developer_guide.md#common-design-considerations), something like the header organization section in #1416.

@ahendriksen
Copy link
Contributor Author

For new developers, it might be not obvious when do we use the header.cuh / header-inl.cuh / header-ext.cuh triplet instead of just header.cuh. It would be good to add a section to the developer guide

Agreed. I have added a section there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change Build Time Improvement CMake cpp improvement Improvement / enhancement to an existing function python
Projects
None yet
Development

Successfully merging this pull request may close these issues.