Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] A robust solution to precompiling function templates #1416

Closed
ahendriksen opened this issue Apr 14, 2023 · 13 comments · Fixed by #1469
Closed

[DISCUSSION] A robust solution to precompiling function templates #1416

ahendriksen opened this issue Apr 14, 2023 · 13 comments · Fixed by #1469
Assignees
Labels

Comments

@ahendriksen
Copy link
Contributor

This issue describes problems we currently have with using specialization headers to reduce compile times and proposes a fix.

This issue accompanies #1415

I have added this issue to motivate the current design of the PR and solicit feedback on certain points.

Problem

RAFT is a heavily templated library. Several core functions are very expensive to compile and we want to prevent duplicate compilation of this functionality. To limit build time, RAFT provides a precompiled library (libraft.so) where expensive function templates are instantiated for the most commonly used template parameters. This is not a complete solution, however, due to two problems (1) accidental reinstantiations and (2) unnecessary dependencies:

  1. In practice, it often happens that a function template is accidentally reinstantiated, even when it is already provided by the library. This can have a big negative impact on compile times. It happens in consuming libraries, like cuML, but is also prevalent within libraft.so itself [#1358, #1360].

  2. A minor change in the core implementation of a function often causes the recompilation of dependent translation units. A particularly bad example is neighbors/specializations.cuh, which provides the extern template instantiations for many expensive kernels and is also included over 100 times in the RAFT project. Any change to any of these kernels triggers a rebuild of all 100 files that include the specializations header.

Requirements

A solution to these two problems should satisfy the following requirements:

  1. An accidental reinstantiation must be impossible when compiling libraft.so, i.e., it must result in an error.

  2. In case of an error, the solution must result in useful and actionable error messages. Specifically, it should not introduce linker errors during development (they only fire at the very end of compilation and do not point to the source location that introduced the error).

  3. It must be possible for downstream libraries to continue to use RAFT as a header-only library, as well as a precompiled library.

In addition, there are some nice-to-haves. That is, we obviously want to write as little boilerplate code as possible. Also, it would be nice if downstream code would not have to bother with the following if else preprocessor logic when including a header:

    #if defined RAFT_COMPILED
    #include <raft_runtime/distance/pairwise_distance.hpp>
    #else
    #include <raft/distance/distance.cuh>
    #endif

Background: C++ templates

I include some background on C++ function templates here, feel free to skip and refer back in case of questions.

A function template in C++ can be declared, defined, and instantiated. In addition, there is something called template specialization but this is not used in RAFT (the “specializations” directory currently contains template instantiations; a case of unfortunate naming).

A template instantiation can be explicit, extern explicit, or implicit. An explicit instantiation looks very similar to a function template declaration. Instead of having the angle brackets in front of the function name, however, the angle brackets are placed behind the function name. When an explicit template instantiation is encountered, the compiler will generate code for the template instance. To prevent generating code, an extern template instantiation can be used, which tells the compiler that some other translation unit already contains the code. An implicit instantiation is an instantiation of a function by use, e.g., a function call, taking a reference, etc.

Examples of declaration, definition, and both types of instantiation are shown below:

    template <typename T> void foo(T arg);                   // template declaration (no body)
    template <typename T> void foo(T arg) { printf("hi"); }  // template definition (with body)
    template foo<int>(int);                                  // template instantiation (explicit)
    extern template foo<int>(int);                           // template instantiation (extern explicit)
    
    void bar() {
      foo<int>(1);                                           // template instantiation (implicit)
    }

An important difference between implicit and explicit instantiation is that explicit instantiation requires that the template has been defined (the function body is needed to generate code). An implicit instance only requires that the template has been declared: instead of generating code the compiler will insert a linker directive. If the template is not instantiated anywhere else, this will result in linker errors.

If the compiler encounters an extern template instantiation and an explicit instantiation, the explicit instantiation “wins”: the code for the template is generated and ends up in the translation unit.

The key to the proposed solution consists of (1) listing allowed instantiations using explicit template instantiation and (2) controlling exactly when and where implicit template instantiation is allowed (i.e., a non-listed instantiation). Part (1) of this solution is already implemented in the specializations headers in RAFT, part (2) would allow implicit instantiations in the following situations:

  • When compiling libraft.so: expensive function templates are explicitly instantiated and their implicit instantiation results in a compiler error.

  • When compiling an external package that depends on libraft.so: implicit instantiation is allowed and when an extern template instantiation is available it is used (code generation is skipped).

  • When compiling with header-only RAFT: implicit instantiation is allowed and extern template instantiations are unavailable.

Solution

What

This section describes (1) macros that are used to control template instantiation, (2) a structure for organizing header files, and (3) a mechanism for meaningful error messages in case of accidental template instantiation.

Macros. We define the macros RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE during compilation of libraft.so and the tests and benchmarks. The RAFT_COMPILED is already used in RAFT and indicates if a translation unit will be linked with libraft.so. It is currently marked in CMakeLists.txt as INTERFACE, meaning it is defined only for downstream libraries using RAFT. In this proposal, it is marked PUBLIC meaning it will be defined both for compiling RAFT as well as downstream libraries.

The RAFT_EXPLICIT_INSTANTIATE macro is new and is only defined during compilation of libraft.so itself (i.e. PRIVATE). When defined, it indicates that implicit instantiations of expensive function templates are forbidden (they should result in a compiler error).

    # RAFT_COMPILED is set during compilation of libraft.so as well as downstream
    # libraries (due to "PUBLIC")
    target_compile_definitions(raft_lib PUBLIC "RAFT_COMPILED")
    # RAFT_EXPLICIT_INSTANTIATE is set during compilation of libraft.so (due to
    # "PRIVATE")
    target_compile_definitions(raft_lib PRIVATE "RAFT_EXPLICIT_INSTANTIATE")

Header organization. Any header file that defines an expensive function template (say expensive.cuh) should be split in three parts: expensive.cuh, expensive-inl.cuh, and expensive-ext.cuh. The file expensive-inl.cuh (“inl” for “inline”) contains the original template definitions and remains largely unchanged. The file expensive.cuh inludes one or both of the other two files, depending on the values of the RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE macros. The file expensive-ext.cuh contains the external template instantiations. In addition, if RAFT_EXPLICIT_INSTANTIATE is set, it contains template defitions that ensure that a compiler error is raised when a function template is implicitly instantiated.

The dispatching by expensive.cuh is performed as follows:

    #if !defined(RAFT_EXPLICIT_INSTANTIATE)
    // If implicit instantiation is allowed, include template definitions.
    #include "expensive-inl.cuh"
    #endif
    
    #ifdef RAFT_COMPILED
    // Include extern template instantiations when RAFT is compiled.
    #include "expensive-ext.cuh"
    #endif

The file expensive-in.cuh is unchanged:

    namespace raft {
    template <typename T>
    void expensive(T arg) {
      // .. function body
    }
    } // namespace raft

The file expensive-ext.cuh contains the following:

    #include <raft/util/raft_explicit.cuh> // RAFT_EXPLICIT
    
    #ifdef RAFT_EXPLICIT_INSTANTIATE
    namespace raft {
    template <typename T> void expensive(T arg) RAFT_EXPLICIT;
    } // namespace raft
    #endif //RAFT_EXPLICIT_INSTANTIATE
    
    extern template void raft::expensive<int>(int);
    extern template void raft::expensive<float>(float);

First, if RAFT_EXPLICIT_INSTANTIATE is set, expensive is defined. This is necessary, because the definition in expensive-inl.cuh was skipped. The macro RAFT_EXPLICIT defines the function body: this macro ensures that an informative error message is generated when an implicit instantiation erroneously occurs and is described below. Finally, the extern template instantiations are listed.

To actually generate the code for the template instances, the file expensive.cu contains the following. Note that the only difference between the extern template instantiations in expensive-ext.cuh and these lines are the removal of the word extern:

    #include <raft/expensive-inl.cuh>
    
    template void raft::expensive<int>(int);
    template void raft::expensive<float>(float);

Error messages. Function templates that should be explicitly instantiated are tagged with the RAFT_EXPLICIT macro(link). This macro defines a function body that triggers an error when the template is accidentally instantiated. The error message looks like this:

static assertion failed with "ACCIDENTAL_IMPLICIT_INSTANTIATION

If you see this error, then you have implicitly instantiated a function
template. To keep compile times in check, libraft has the policy of
explicitly instantiating templates. To fix the compilation error, follow
these steps.

If you scroll up or down a bit in your error message, you probably saw a line
like the following:

detected during instantiation of "void raft::foo(T) [with T=float]" at line [..]

Simplest temporary solution:

    Add '#undef RAFT_EXPLICIT_INSTANTIATE' at the top of your .cpp/.cu file.

Best solution:

    1. Add the following line to the file include/raft/foo.hpp:

        extern template void raft::foo<double>(double);

    2. Add the following line to the file src/raft/foo.cpp:

        template void raft::foo<double>(double)
"
detected during instantiation of "void raft::foo(T) [with T=int]" at line 115

This should help a user who just wants the error to go away while prototyping some new functionality (they will have more than enough time to circle back to this issue while the code is compiling). In addition, it tells the user how to fix the issue the “right way” by adding explicit instantiations to the correct files.

Why/how does this solve the problem

I address why this solution prevents accidental reinstantiation and unnecessary dependencies in three situations: incremental compilation with tests and/or benchmarks, compilation in RAFT CI, compilation of downstream libraries.

Incremental compilation. In this situation, libraft.so, the tests, and benchmarks are compiled with RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE defined. Therefore, when a template is instantiated that is not in the list of explicit extern instantiations, it will raise a compiler error.

When the internals of a kernel in expensive-inl.cuh are changed, the test file will not have to be recompiled since it only depends on expensive.cuh and expensive-ext.cuh. Only the file expensive.cu will be recompiled. This prevents chains of recompilations from forming as a result of a simple change.

Compilation in CI. Here, we can also catch accidental implicit instantiations. In addition, as a result of reducing unnecessary dependencies, the probability that sccache has a cache hit is drastically increased, since changes to the internals of expensive function templates do not lead to a cascade of recompilation anymore.

Compilation of downstream libraries. Downstream libraries can either use header-only RAFT, in which case nothing changes (both macros are undefined and the *-inl.cuh headers are included), or they can use libraft.so in which case RAFT_COMPILED is defined. This way, when the library uses a template instance that has already been compiled in libraft.so, code generation is skipped and the code in libraft.so is used. When the library instantiates a function template that has not been instantiated in libraft.so, the code is generated like it was before without any errors. The library might want to opt-in to implicit instantiation prevention by defining RAFT_EXPLICIT_INSTANTIATE, but I expect this will be rare.

Advantages / disadvantages

First of all, this proposal solves the problems of accidental reinstantiation and unnecessary dependencies. It therefore substantially cuts down on compilation times and we can count on this being the case in the future.

In terms of code organization, an additional advantage is that things are easier to find. The current directory structure of specializations does not follow the directory structure of include/ and it can be a bit difficult to find an extern template instantiation. The proposed organization always has extern template instantiations in the *-ext.cuh header and the actual instantiations in the src/ directory in the same relative directory.

A disadvantage is that the proposed structure requires more boilerplate, as we have four repetitions of the function template declaration instead of three. This can be bit annoying to set up, but it only has to be done for new functionality or when the interface of a function changes. This will happen more rarely than changing the internals of a function, which should now be considerably faster.

Nuances

Some kernels are instantiated many times, but it may not make sense to actually prevent implicit instantiation. One case is raft::linalg::reduce that is used in many places. To cover all instances with an extern template instantation would be prohibitive, but covering none would result in some instances being defined in over 80 translation units. In this case, it would make sense to allow implicit instantiations and also have extern template instantiations for commonly used types.

The RAFT_EXPLICIT macro only works reliably for functions. In the detail headers, there were some cases where a struct was used to dispatch to different kernels. In these cases, I had to replace a member function by a free function.

Alternatives considered

One alternative that would require less boilerplate is to declare instead of define function templates. A downside is that it would result in linker errors but not raise any compiler errors on implicit instantiation (and thus would not tell you where the instantiation occurred).

Another alternative is to continue with the current strategy. A downside is that it will require constant firefighting to ensure that (1) extern template instantiations are used when they are available, (2) index types and data types at usage site do not start to diverge from the extern template instantiations. All of this would have to be done based on the build time reports (an increase in build time being an indication that something is wrong) and copious use of cuobjdump to hunt for duplicate template instantiations. So far, experience has shown that this is a brittle strategy.

Open questions

  • Would it be worth developing a strategy for forcing explicit instantiation of structs as well? They are sometimes used internally and if we can it to work it would save quite some typing, but I fear the solution would look complicated (even compared to the current proposal).

  • Where to keep the docstrings? In Remove specializations and split expensive headers #1415, I have mostly opted to keep docstrings in both `-ext` and `-inl` headers. I would lean towards having the docstrings in the `-ext` headers, as those headers are more dense and do not show the function bodies, which can give a nicer overview.

@wphicks
Copy link
Contributor

wphicks commented Apr 17, 2023

I quite like this proposal! I especially like that this encourages other good coding practices while solving the central problem. By encouraging that specific organization of header files, we should do a better job of ensuring that we have discrete, well-defined headers that we #include for specifically what we need rather than falling into situations like we have with neighbors/specializations.cuh.

+1 all the way around from me. The only thing that it would be interesting to see if we could improve on is the generality of the RAFT_EXPLICIT macro. Being able to throw it on as needed with as little thought as possible will help ensure that folks use it consistently.

@tfeher
Copy link
Contributor

tfeher commented Apr 17, 2023

Having a useful error message is great! I do not mind paying an extra template declaration for it (and we only do this for expensive functions anyways).

Where to keep the docstring?

To me keeping them in '-inl' would feel more natural, that is the file I am looking at when I am interested in details. For a condensed view we have the rendered API doc.

@vyasr
Copy link
Contributor

vyasr commented Apr 19, 2023

This is a very well thought-out proposal! Some suggestions:

  • I prefer very explicit macro names, in this case RAFT_EXPLICIT_INSTANTIATE_ONLY so that it's very clear that we're prohibiting implicit instantiations.
  • The exact interaction between the RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE macros depends on the different compilation cases, specifically whether or not the RAFT_COMPILE_LIBRARY CMake variable is set. It would be helpful to include a table in CMakeLists.txt or in docs describing the different cases are and what template definitions and instantiations will end up existing in each case.
  • Since this strategy will have thoroughly different behaviors for the header-only and compiled lib cases, I expect that it will also require some reworking of the conda recipes since the header-only version of the library will need to export a meaningfully different CMake config than the compiled version to ensure that the write macros are defined for consuming libraries. I don't recall off the top of my head how much that changes relative to the current behavior, but I suspect it will involve nonzero modifications.
  • I would prefer to keep docstrings in the inl rather than the ext file. I see your argument about the ext files being less dense, but the tradeoff is that the ext files in your layout are effectively just error files. The closest analogy that I can think of would be using SFINAE to dispatch to one "real" case and one error case and only putting documentation on the error case. That seems backwards to me.
  • In the "Nuances" section you mention cases where you may want to allow implicit template instantiation while still providing some set of explicitly instantiated templates for common use cases. In those cases you would simply drop the -ext file as well as the corresponding block from expensive.cuh, right? Is there more that needs to be done for those cases?

@ahendriksen
Copy link
Contributor Author

ahendriksen commented Apr 19, 2023

Thank you all for your feedback.

The only thing that it would be interesting to see if we could improve on is the generality of the RAFT_EXPLICIT macro.

@wphicks I agree that would be nice, especially as structs are used in the most complex parts of RAFT. I already spent some hours in compiler explorer and I could not figure it out. One problem is that an extern template struct ... statement instantiates the struct methods. If we label the struct methods with RAFT_EXPLICIT, we immediately get a compilation error. I also couldn't achieve reliable raising of errors by putting a macro / static assert in the struct body. If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?

I would prefer to keep docstrings in the inl rather than the ext file.

@tfeher , @vyasr I think there is consensus that keeping the docstring in the -inl.cuh header is better. I will address this in the PR. Another argument in favor of the -inl header is that it keeps documentation and code closer together.

I prefer very explicit macro names, in this case RAFT_EXPLICIT_INSTANTIATE_ONLY

@vyasr Good idea. I will adress this in the PR.

In the "Nuances" section you mention cases where you may want to allow implicit template instantiation while still providing some set of explicitly instantiated templates for common use cases. In those cases you would simply drop the -ext file as well as the corresponding block from expensive.cuh, right? Is there more that needs to be done for those cases?

I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext file would remove a prominent signal which template have instantiations, and this single file would become unwieldy. A simple example still looks reasonable:

namespace raft{
template <typename T>
void expensive(T arg) { /* body */ }
} // namespace raft

#ifdef RAFT_COMPILED
extern template void raft::expensive<float>(float);
extern template void raft::expensive<double>(double);
// etc..
#endif 

In practice, however, the extern template instantiations are very visually noisy and would become unwieldy especially if wrapped in an ifdef RAFT_COMPILED.

The exact interaction between the RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE macros [...] It would be helpful to include a table

I will try to find a place to these tables:

RAFT_COMPILED RAFT_EXPLICIT_INSTANTIATE_ONLY Effect
defined defined Templates are precompiled. Compiler error on accidental instantiation of expensive function template.
defined Templates are precompiled. Implicit instantiation allowed.
Nothing precompiled. Implicit instantiation allowed.
defined Avoid this: nothing precompiled. Compiler error on any instantiation of expensive function template.
RAFT_COMPILED RAFT_EXPLICIT_INSTANTIATE_ONLY Which targets
defined defined raft::compiled, RAFT tests, RAFT benchmarks
defined Downstream libraries depending of libraft like cuML, cuGraph
Downstream libraries depending on libraft-headers like cugraph-ops.

I expect that it will also require some reworking of the conda recipes

I have not noticed that the conda recipes should change while testing the PR with cuML and cuGraph. Documenting my understanding of how downstream libraries consume RAFT below:

The RAFT CMake configuration always builds the header-only part, called raft. When RAFT_COMPILE_LIBRARY is set, then a precompiled library target is also created, going by a variety of aliases: raft_lib, raft::compiled, raft_compiled. I don't know why we have so many aliases of the same library, but raft::compiled seems to be the one that is most used.

We create three conda packages for the library: libraft-headers-only, libraft-headers, libraft. The first two both package only the headers (and the libraft-headers-only package has fewer runtime dependencies to make life easier for cugraph-ops). The libraft package contains the precompiled libraft.so and depends on libraft-headers.

Downstream packages like cuml depend on the libraft conda package. In their CMake config, an option like CUML_RAFT_COMPILED determines whether they "link" (in the CMake kind of way) to raft or also raft::compiled.

When cuML links to raft, then RAFT_COMPILED and RAFT_EXPLICIT_INSTANTIATE_ONLY remain undefined. When cuML links to raft::compiled then RAFT_COMPILED is automatically defined and RAFT_EXPLICIT_INSTANTIATE_ONLY is undefined.

@wphicks
Copy link
Contributor

wphicks commented Apr 19, 2023

If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?

If it's not too much trouble to do so, I'd love to take a look. I don't know if I'll have any luck, but it would be good for me to at least get a better understanding of the challenges.

@divyegala
Copy link
Member

divyegala commented Apr 19, 2023

@ahendriksen thank you very much for the write-up. I fully agree with your entire proposal, and it's a very clean way to break and sort headers, as well as disallow huge chains of rebuilds.

I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext file would remove a prominent signal which template have instantiations, and this single file would become unwieldy

I agree with this. We should always have -inl and -ext for any files that are providing pre-compiled functions. For certain functions, where implicit instantiations are acceptable so that the compiler doesn't have to wait for a definition and instantiation to be generated, I think the -ext file should just not define the macro RAFT_EXPLICIT. I imagine this happens for functions like raft::linalg::reduce because they are heavily used internally. (Or perhaps, to keep a similar structure we define a macro called RAFT_IMPLICIT)

I would lean towards having the docstrings in the -ext headers, as those headers are more dense and do not show the function bodies, which can give a nicer overview.

This is acceptable, as long as the HTML generated for the docs still show the include to be #include <expensive.cuh>

If there is interest, I could put together a minimal example in godbolt.org so that others can make an attempt?

I would be interested in this as well!

@vyasr
Copy link
Contributor

vyasr commented Apr 19, 2023

In the "Nuances" section you mention cases where you may want to allow implicit template instantiation while still providing some set of explicitly instantiated templates for common use cases. In those cases you would simply drop the -ext file as well as the corresponding block from expensive.cuh, right? Is there more that needs to be done for those cases?

I would advise against keeping everything in one file in this case. For two reasons: Removing the -ext file would remove a prominent signal which template have instantiations, and this single file would become unwieldy. A simple example still looks reasonable:

I think I worded my question poorly, and also perhaps made some invalid statements when writing this. What I was getting at was that, in order to special-case some templates to allow instantiation even when RAFT_EXPLICIT_INSTANTIATE_ONLY is defined, those templates need the template definition available. I was trying to understand how those special cases should be handled. On a second look, I guess all that really needs to happen is that if I have some template expensive_but_always_allow_implicit, then expensive_but_always_allow_implicit.cuh would always include the inl file even if RAFT_EXPLICIT_INSTANTIATE_ONLY is defined. Is that correct?

Regarding the four cases, we can ensure that the bad case (RAFT_COMPILED not defined but RAFT_EXPLICIT_INSTANTIATE_ONLY defined) never happens as long as raft is being compiled via its CMake. If we want to be extra safe I suppose we could include a single small detail TU somewhere that contains a static_assert if that particular combination is ever encountered.

The RAFT CMake configuration always builds the header-only part, called raft. When RAFT_COMPILE_LIBRARY is set, then a precompiled library target is also created, going by a variety of aliases: raft_lib, raft::compiled, raft_compiled. I don't know why we have so many aliases of the same library, but raft::compiled seems to be the one that is most used.

The short answer here is that raft_lib is the actual compiled library, raft_compiled is a target that is always made available so that it can be linked to in CMake even if using a header-only raft (in which case it will silently fall back to using raft in header-only mode), and raft::compiled is an alias that is what should be used almost all the time by consumers.

The headers and headers-only conda package are equivalent from a build perspective and only different from a conda package perspective in terms of the recipe dependencies to allow consumers to not install all of the possible dependencies of raft if they are only using a subset of its functionality that doesn't use certain library dependencies.

Thinking about it further, I think you're right that we may not need to do additional work. The existing targets pretty much handle everything the same. The one possible difference is that this may uncover cases where a dependent library (cuml/cugraph) is instantiating a template that it shouldn't be when using a compiled libraft.so, but that would be exactly the kind of situation that the new layout helps to solve anyway so that's a good thing.

@ahendriksen
Copy link
Contributor Author

If it's not too much trouble to do so, I'd love to take a look.

@wphicks , @vyasr , Cool! I have prepared an example in compiler explorer. Here is also a non-working attempt for inspiration (link). The example consists of one piece of code and four compilers which have a different set of macros defined. Currently, not all compilers give the expected output.

I guess all that really needs to happen is that if I have some template expensive_but_always_allow_implicit, then expensive_but_always_allow_implicit.cuh would always include the inl file even if RAFT_EXPLICIT_INSTANTIATE_ONLY is defined. Is that correct?

This is indeed what I do. An example can be found in coalesced_reduction-ext.cuh, coalesced_reduction-inl.cuh, coalesced_reduction.cuh.

Regarding the four cases, we can ensure that the bad case (RAFT_COMPILED not defined but RAFT_EXPLICIT_INSTANTIATE_ONLY defined) never happens as long as raft is being compiled via its CMake. If we want to be extra safe I suppose we could include a single small detail TU somewhere that contains a static_assert if that particular combination is ever encountered.

I have considered this as well. I fear,however, that any static assert that checks for this situation will be drowned out by all the RAFT_EXPLICIT static_asserts and error messages. So I think the benefit is marginal.

The one possible difference is that this may uncover cases where a dependent library (cuml/cugraph) is instantiating a template that it shouldn't be when using a compiled libraft.so, but that would be exactly the kind of situation that the new layout helps to solve anyway so that's a good thing.

To clear up any possible confusion: this proposal does not advocate defining RAFT_EXPLICIT_INSTANTIATE_ONLY for downstream libraries. At least, not initially. Therefore, these libraries should not have compilation failures due to instantiating the wrong templates. At some point in the future, some libraries may want to consider defining RAFT_EXPLICIT_INSTANTIATE_ONLY, but I expect that it will require some work to align all the template types.

@tfeher
Copy link
Contributor

tfeher commented Apr 20, 2023

The error message we get during accidental instantiation is really useful! I could already make use of it while working on CAGRA instantiations. The message lists all the template types that was used during the instantiation, and I could cross check it with the explicit instantiations. This helped me to catch the missing const from the default_accessor of an mdspan that caused the mismatch.

Here is how the actual error looked in my case:

error: static assertion failed with "ACCIDENTAL_IMPLICIT_INSTANTIATION

detected during:
instantiation of "raft::neighbors::experimental::cagra::index<DataT, IdxT> raft::neighbors::experimental::cagra::build(
    const raft::device_resources &, const raft::neighbors::experimental::cagra::index_params &, 
    raft::mdspan<const DataT, raft::matrix_extent<IdxT>, raft::row_major, Accessor>)
[with DataT=float, IdxT=uint32_t, Accessor=raft::host_device_accessor<std::experimental::default_accessor<const float>, raft::memory_type::device>]" 

@vyasr
Copy link
Contributor

vyasr commented Apr 20, 2023

Sorry I think I'm a bit confused about the godbolt example now. You're showing the four possible combinations of values of the two macros and what should happen in each case. In this particular example I think you could trigger the desired error in the one problematic case by inserting a static_assert(false, "This should fail"); as the contents of the expensive::run definition that is in the ifdef EXPLICIT_ONLY block, but I get the feeling that's not the question that you're asking. Could you clarify what we're trying to make happen here?

@ahendriksen
Copy link
Contributor Author

you could trigger the desired error in the one problematic case by inserting a static_assert(false, "This should fail");

Could you clarify what we're trying to make happen here?

It is indeed a bit tricky. As you indicate, you can trigger an error using the static_assert. This will cause errors in too many situations though. The static assert fires whenever EXPLICIT_ONLY is defined, which causes problems when you use an instance which has been explicitly extern template instantiated (second row int table below).

Macros defined We want static_assert solution
USE_EXPLICIT_EXTERN Linker error (use external symbol) ✔ Linker error
USE_EXPLICIT_EXTERN + EXPLICIT_ONLY Linker error (use external symbol) ❌ Compiler error
EXPLICIT_ONLY Compiler error ✔ Compiler error
None Compiles fine ✔ Compiles fine

note: of course we don't want a linker error in RAFT, but for the purpose of the compiler explorer example we want a linker error (otherwise we would have to create multiple translation units and link them which would become too complicated).

@vyasr
Copy link
Contributor

vyasr commented Apr 24, 2023

Yup the linker error makes sense in the context of godbolt. I don't think I was clear enough, doesn't inserting:

#ifndef USE_EXPLICIT_EXTERN
        static_assert(false, "Can't compile");
#endif

solve the problem?

@ahendriksen
Copy link
Contributor Author

That would solve the problem by "cheating". I think I see where the confusion is coming from and I realize I should have been clearer. I am using the USE_EXPLICIT_EXTERN in the context of the godbolt example to drive the example code. There is no analogous RAFT macro. The EXPLICIT_ONLY macro is an analogue to the RAFT_INSTANTIATE_EXPLICIT_ONLY macro. So RAFT can't make use of the USE_EXPLICIT_EXTERN macro to drive behavior.

The table below summarizes the situation without putting both macros on the same pile.

Example uses Macro defined We want static_assert solution
expensive<int> Linker error (use external symbol) ✔ Linker error
expensive<int> EXPLICIT_ONLY Linker error (use external symbol) ❌ Compiler error
expensive<float> EXPLICIT_ONLY Compiler error ✔ Compiler error
expensive<float> Compiles fine ✔ Compiles Example

Here, expensive<int> has the extern template instantiation.

rapids-bot bot pushed a commit that referenced this issue Apr 28, 2023
This is a rebase of all the commits in PRs:
- #1437 
- #1438 
- #1439 
- #1440 
- #1441 

The original PRs have not been rebased to preserve review comments. This PR is up to date with branch 23.06.

Closes #1416

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1469
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging a pull request may close this issue.

5 participants