Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] All primitives should be hiding implementation details and exposing only public API #330

Closed
cjnolet opened this issue Sep 17, 2021 · 19 comments
Assignees
Labels
EPIC These might span releases feature request New feature or request

Comments

@cjnolet
Copy link
Member

cjnolet commented Sep 17, 2021

While the primitives in cuml only needed to worry about cuml algorithms as consumers, hiding implementation details wasn't particularly important. However, now that the primitives in raft are beginning to find more consumers downstream, we should start doing this across the board for all primitives. There are several benefits to doing this an the only drawback that I can think of is that it's going to require an initial refactor to clean up the current primitives (which I propose we do incrementally)

Benefits:

  1. The contract between RAFT and the consumers will be more concrete so we can change implementations with more agility so long as it doesn't create large changes to the contract (or expectations of the behavior).
  2. The public APIs become easier for users to discover while digging around the code
  3. Documentation of the public APIs becomes more manageable and it becomes much more straightford to build and maintain doxygen API docs that can be published with the rest of the RAPIDS docs.
  4. It becomes much more straightforward which prims belong to which packages and, reiterating from no.1, it's much more straightforward for users which functions are being used.
  5. The include files for the public API can be consistently hpp, which puts less burden on consumers to know which one to use. I think we can maintain backwards compatibility with this for awhile, maybe with a warning, and then cut out the cuh files from the public API altogether at some point.

Tagging the usual suspects for thoughts. @teju85 @divyegala @ChuckHastings @tfeher @wphicks My proposal would be that we start doing this for all new prims as of 21.12 and then iteratively move the implementation details for existing prims (note that we should be able to do this initially without breaking the public APIs at all, so long as the consumers are all actually using public APIs downstream).

@cjnolet cjnolet added the feature request New feature or request label Sep 17, 2021
@divyegala
Copy link
Member

@cjnolet thanks for bringing this up. I agree with each and every proposed point here! The only thing that I would like to do differently is that I would like to specify a standard contract, as you mention, for every RAFT public primitive that we have right now. This would involve things like (and likely more):

  1. Getting this PR Iterator Traits to be used by RAFT APIs #294 merged for type safety, multiple input types, etc
  2. Removing extra stream argument
  3. Replacing unreadable booleans on the API with clear enum classes

Once we have a contract ready for all existing prims, we can enforce it for new prims. But going backwards would mean having to refactor for all prims that come in at the same time. Thus, I propose that we implement all your proposals for existing primitives first, and let that be the standard that needs to be met for any new primitive that comes into RAFT.

@cjnolet
Copy link
Member Author

cjnolet commented Sep 17, 2021

@divyegala,

Just to provide some background on this issue, I started working on #314 and realized as I was going through the files in the codebase that in many instances I was having trouble determining which functions were actually prims and which were internal, such as helper functions. In fact, I'm familiar with most of these prims and I know that if I'm having this trouble then consumers who are unfamiliar with them are going to be having an even tougher time.

I definitely agree we should do the things you are proposing as well, but I think starting by establishing which things are actually public prims will make it even easier to do the things you are proposing. In just about all cases, I've managed to be able to do this for the prims being consumed by cuml, and since we've already been using the detail namespace, hiding the implementaiton details should be as straightforward as adding a new directory and moving internal code out of the current headers.

With risk of the scope creeping up further than it needs to for this, what I'm proposing here can also be done right now within raft itself as a first step, without breaking any existing APIs or causing any updates to consumers downstream. This should make updating public APIs much easier, since it would actually establish which things are, in fact, part of the public APIs. From my perspective, this doesn't remove anything nor break anything, and it only adds clarity to what's already here.

Do you mind elaborating on your number 3?

@divyegala
Copy link
Member

@cjnolet you are right, we should definitely not extend the scope of this issue. I would also love to just separate our public and detail APIs first as that makes it easier to change in the future. I am only worried about not having a consistent public API before exposing public primitives, and then going ahead and breaking that API.

As for number 3, many of our primitives have raft_prim(..., true, false) like usage. It's not a big issue but I would like public API to be clean of such booleans as well

@ChuckHastings
Copy link
Contributor

I wholeheartedly agree with the concept here.

I know this is not related to just this issue but we need to start treating the public APIs across all of RAPIDS as if they are actually used by others and change them with great care (e.g. adding new methods and deprecating old methods rather than arbitrarily changing method signatures in a destructive way). Our own internal usage has gotten complex enough to need this, and we're starting to see others using our components.

The approach is very reasonable, although I would say that we should try and put some urgency on actually making these changes. Perhaps something like:

  • All new prims doing this in 21.12
  • Separation of public from detail APIs in 21.12
  • Move 50% of prims to this construct in 22.02 (or whatever the next number is)
  • Move remaining prims to this construct in 22.04 (ditto)

@cjnolet
Copy link
Member Author

cjnolet commented Sep 20, 2021

@ChuckHastings,

That timeline looks very reasonable to me and I agree that we should use a deprecation strategy to update the public APIs, especially now that RAFT is being used by multiple different libraries. Again, not wanting to stretch the scope of the public API discussion too far, but I'm seeing the last two items in your list as additions to the public API, rather updates, for the purposes of maintaining backwards compatibility. Perhaps we could choose a version (maybe 22.06?) to remove the original public API functions. So maybe something like:

  • All new prims doing this in 21.12
  • Separation of public from detail APIs in 21.12
  • Add new methods which move 50% of prims to this construct in 22.02 (or whatever the next number is)
  • Add new methods which move remaining prims to this construct in 22.04 (ditto)
  • Remove original methods 22.06

(you might have been implying this in your original list but I'm just hoping to make it more explicit)

@divyegala
Copy link
Member

@cjnolet @ChuckHastings just to follow for clarity, what do you both mean by all new prims doing this in 21.12?

@cjnolet
Copy link
Member Author

cjnolet commented Sep 20, 2021

@divyegala,

Since hiding the private bits is straightforward and doesn't break any current API comptaibility, I was taking it to mean that we would be making sure new prims do this. Ideally, they could also do some of the API and consistency cleanup, like removing the explicit stream argument, if not too much trouble. @ChuckHastings, please correct me if I misunderstood.

What I like about the proposed deprecation strategy is that it can be done as an intermediate step and gives consumers an opportunity to work the API updates into their timelines. That way we're not having to orchestrate multiple projects to do everything all at once.

@ChuckHastings
Copy link
Contributor

@divyegala You corrections are what I implied as was (in retrospect) ambiguous about.

What I mean by "all new prims doing this in 21.12" is that any new primitives created starting in version 21.12 or anytime after that would use the new paradigm for hiding implementation details. We should make checking for hiding the implementation details part of the criteria of approving a new primitive.

@cjnolet cjnolet self-assigned this Sep 30, 2021
@cjnolet
Copy link
Member Author

cjnolet commented Oct 12, 2021

Adding a checklist so that we can tackle this in several smaller PRs rather than one monstrous PR:

The goal here is for any new primitives to expose a header file with the public API which is invoking a the private api in the detail namespace. The public API should all be accessible through hpp extension for consistency.

@github-actions
Copy link

github-actions bot commented Dec 1, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@cjnolet cjnolet added EPIC These might span releases and removed inactive-30d labels Dec 6, 2021
@github-actions
Copy link

github-actions bot commented Jan 6, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@cjnolet
Copy link
Member Author

cjnolet commented Feb 2, 2022

Just updating this thread as it's now been a couple releases. PRs have been opened for the remaining bits and it looks like this issue should be able to be closed by 22.04.

rapids-bot bot pushed a commit that referenced this issue Feb 8, 2022
@MatthiasKohl
Copy link
Contributor

@cjnolet @divyegala I haven't come across this issue and the associated PRs before, but now that PR #383 has been merged, it has hit me.

I'll explain the issue a bit more generally:
Basically, in cugraph-ops, we have introduced RAFT to replace CUDA utilities and a few other header-only primitives.
The README states:

At its core, RAFT is a header-only C++ library, [...]

I can understand the need to separate some internal implementation details within RAFT from the public API, but to me, a header-only library is precisely about the implementation details, so it's hard to separate things perfectly.

What has happened specifically: with PR #383 cublas_wrappers.hpp was moved into a detail folder.
That file contains macros for calling cublas functions (RAFT_CUBLAS_TRY).
Unless you cover the cublas API entirely in RAFT (and I don't think this is the case, since I don't believe batched GEMM is available), I would argue that these macros are useful for downstream projects, but they are now considered implementation details.

So what would be nicer is to make it a bit clearer what RAFT actually provides / what each part of RAFT provides:

  • Does it provide device-level primitives for writing device functions?
  • Does it provide CUDA / CUDA library utilities for simplifying interaction between custom host and device functions?
  • Does it provide host-level primitives for writing more high-level algorithms on the host side?

Another example where this is problematic:
If I include raft/linalg/gemm.hpp then ultimately raft/cuda_utils.cuh gets included (in raft/linalg/detail/gemm.hpp), even though cublas is a host-level API and the higher-level headers have the hpp extension suggesting that I can use a non-CUDA compiler.
This particular include can probably be easily removed, but there are plenty of similar examples, which I think are quite confusing.

It would be nice to somehow clarify what RAFT or its parts actually provide within the README or doxygen docs, so that these issues can be avoided in the future.

@teju85
Copy link
Member

teju85 commented Feb 9, 2022

To echo further on @MatthiasKohl's second half of the comment... In cuML, we try to follow this naming convention of headers to clearly tell the users whether a header file is include-able from a .cpp or should it be a .cu.

For eg: after the recent refactor, the rng.hpp now violates the cuML way of naming header files. There's no way one can include rng.hpp from a .cpp file! But personally, to me, just looking at the filename seems to suggest that I can include this from a .cpp file. At the same time, one could potentially include raft/linalg/detail/cublas_wrappers.hpp from a .cpp file without any issues. As it stands today, there's no distinction being made between these types of include's!

I'm NOT saying that we shouldn't hide implementation details inside a detail namespace. I believe that's a good way to tell users which ones they can reliably use without the fear of breakages. I'm also NOT saying that what cuML is doing is correct. All I'm saying is that cuML's approach atleast makes it very apparent in the header file extension which ones can be included from which type of source files. This IMO helps avoid such confusions, without any need for extra documentation.

@cjnolet
Copy link
Member Author

cjnolet commented Feb 9, 2022

@MatthiasKohl,

I can understand the need to separate some internal implementation details within RAFT from the public API, but to me, a header-only library is precisely about the implementation details, so it's hard to separate things perfectly.

TLDR; RAFT is supposed to be all three things you listed, however it never occured to me that folks would want to compile it without a CUDA-enabled compiler.

The purpose of hiding implementation details here is to make it very explicit which APIs can be invoked by users so that we can keep them lightweight, flexible and most importantly, stable. When we had originally moved the primitives over from cuml, there were device functions in files w/ host functions, some of the device functions were being invoked externally to the files and it was very hard for users to determine which functions were intended to be invoked just by looking at the code itself. By exposing the public API as thin wrappers around the implementations, we more clearly separate the two and make very obvious which functions we intend to keep stable.

With that stability also comes the opportunity for more consistency across the public APIs, and we're planning to make heavy use of the mdspan (#437) to round out the integration and consolidate function arguments into a development workflow that should hopefully be cleaner and easier to use. That said, the cublas_wrappers file was really intended to centralize the entry points to the underlying ctk libraries and the calls being made to those libraries (at least from cuml) are largely already covered by a public facade, such as you've seen in raft/linalg/gemm.hpp. Absolutely, cuda_utils.cuh should not be getting included there if you are intending to use that from a non-cuda-enabled compiler.

I recently pulled the cusparse macros into a separate file called cusparse_macros.h and we can do this for cublas / cusolver and pull the macros out into the public namespace if that's useful. I would really like to avoid exposing the underlying cublas wrappers and would prefer to see a consistent API used across RAFT because this also allows us to change the underlying implementations as needed.

@cjnolet
Copy link
Member Author

cjnolet commented Feb 9, 2022

@teju85 @MatthiasKohl The more I've thought about this, the more I've realized that I've been operating under the assumption a CUDA-enabled compiler will always be used to build RAFT's primitives. Since there's so many custom kernels and things like Thrust and CUB in our code, it never even occurred to me that we would want to compile some of the code in cpp files. For this reason, I've been in favor of using the same extension everywhere for consistency- and I chose to use hpp mostly because it signifies it's a C++ API (Thrust might be a reasonable example for why I've done this, though it might be a bit of a stretch since thrust does also have a back-end that can be compiled w/o CUDA).

If we want to provide the ability for some of RAFT's APIs to be built without a CUDA-enabled compiler then it could make sense to use the filename extensions to separate the two. We might want to think about it a little more, though- my real hesitation here is that when I'm in the process of building an algorithm, I'm going to constantly have to be looking at the docs to see which extension to use for different primitives. Since we know all the files can be compiled w/ nvcc but only some of them can't be compiled w/o it, I'm actually more in favor of making everything .cuh and then adding additional files for the functions that can be compiled w/o hpp. In additionk, I think we should use doxygen group tags to help make the distinction more clear.

@harrism, I recall there might be a performance implication here but I'm having trouble with the details. Does nvcc end up being slower to parse or preprocess hpp headers that contain device code (or the other way around)?

@MatthiasKohl
Copy link
Contributor

@cjnolet Thank you for the detailed answers here!

To add a bit more details about the non-CUDA-enabled compiler:
For me personally, RAFT really has a few core headers outside of the algorithms provided in random / sparse / linalg / distance etc.:

  1. raft/cuda_utils.cuh : utilities for device function development
  2. raft/cudart_utils.h : utilities for host-side function development
  3. raft/error.hpp : underlying error macros and exception structures for error handling w.r.t. CUDA runtime
  4. raft/handle.hpp : declaration of the raft handle holding state useful for interacting with CUDA runtime (streams, other library handles etc.)

The latter 3 of those core components can all be compiled with a non-CUDA-enabled compiler and I think that this is a good thing, since it simply allows more flexibility and it allows to use RAFT for projects that only interact with the CUDA runtime from host, without any device functions.

I believe that GCC on its own is quite a bit faster than nvcc with a GCC backend, although I have to admit that I don't have numbers for this right now.
I am sure that clang is faster for host-only vs. host-and-device, which makes sense since it has to parse at the very least the intrinsic device functions declared globally in CUDA before parsing anything else (https://github.com/llvm-mirror/clang/blob/master/lib/Headers/__clang_cuda_device_functions.h).
I can imagine something similar happening with nvcc and GCC.

More importantly, I think that it makes sense to clearly define the scope of each header or section/folder in RAFT because it hints users as to what they can expect from that header or folder, and this is where the file extension can help additionally.

For example, regardless of the compiler issue, I assumed that the cublas macros are useful in any project using cublas, since they are similar to raft/error.hpp w.r.t. CUDA runtime and so for me, this should clearly be part of the public interface.
If the implementations of cublasaxpy / cublasCopy etc. should not be directly part of the public API, then they should probably be separated into a different file, one for the macros (public API) and one for implementations of functions (cublasaxpy / cublasCopy etc.).

By separating the scope clearly, being able to use a non-CUDA-enabled compiler for large parts of RAFT will be a side-effect rather than a stated goal, and I'm not asking for RAFT to be compiled by a non-CUDA-enabled compiler as much as possible, but to clearly state and separate the scope of headers / folders.

@harrism
Copy link
Member

harrism commented Feb 15, 2022

@harrism, I recall there might be a performance implication here but I'm having trouble with the details. Does nvcc end up being slower to parse or preprocess hpp headers that contain device code (or the other way around)?

I don't think so. I think what you are thinking of is that NVCC is slow, period. Therefore, in libcudf, for example, we avoid .cu files when we don't need them. For example if there are no __device__ functions in a source file, and it doesn't include any headers that contain __device__ code, we name it .cpp, so it gets compiled with GCC and not NVCC to save compile time.

This is one of the advantages of using rmm::device_uvector, for example. device_uvector.hpp contains only CUDART API calls, no __device__ code (because it does not initialize in the ctor). Contrast to device_vector.hpp which is initializes in the ctor using a kernel launch (via a call to thrust::uninitialized_fill). Therefore device_vector.hpp can only be included from .cu TUs, while device_uvector.hpp can be included from either .cu or .cpp TUs.

We aim for most libcudf tests and benchmarks to be in .cpp files (since the libcudf APIs they test are not header-only).

@cjnolet
Copy link
Member Author

cjnolet commented Feb 24, 2022

I've opened #524 to start addressing the issues w/ the header file extensions.

@cjnolet cjnolet closed this as completed Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPIC These might span releases feature request New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants