-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETI System and file structure #31
Comments
I thought now long and hard about a sane, sustainable way for organizing this into files. Here is what I came up with:
Lets talk about what you need to touch to do specific things: Add a new function:
Modify the implementation of a function
Add a new ETI type
Add a new TPL variant
|
Lets look at the code and what those things do. Public API in
|
Next up: The Specialization LayerThis layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI). Generally this layer is very thin again and basically just passes through arguments. The basic mechanism for ETI is the To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available: From template<class ViewType>
struct foo_eti_spec_avail {
enum : bool { value = false };
};
template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
static void foo(const ViewType& a);
}; In order to declare a specialization available a full specialization of The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
static void foo(const ViewType& a) {
execute_foo(a);
}
}; In this file we also need to define the macros which are later used in the auto generated files: // Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
enum : bool { value = true }; \
};
// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;
// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;
// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif Note how the actual declarations of those classes are only included when we are NOT compiling the library. I'll post the whole file later after discussing some more Macro stuff. |
The implementation layer in template<class ViewType>
void execute_foo(const ViewType& a) {
Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
a(i) = i;
});
} If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places. |
The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is template<class ViewType>
struct foo_tpl_spec_avail {
enum : bool { value = false };
};
#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
enum : bool { value = true };
};
#endif Basically for every new TPL which we want to support we drop another full specialization of this stuff in. The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not. #ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {
// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;
static void foo(const ViewType& a) {
#if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
printf("Calling MKL Specialization\n");
#endif
mkl_foo(a.data(),a.extent(0));
}
};
// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;
static void foo(const ViewType& a) {
// Our code is better for large number of entries, so only use TPL for small lengths
if(a.extent(0) < 100000)
Foo<ViewType,true,false>::foo(a);
else
Foo<ViewType,false,true>::foo(a);
}
};
}
}
#endif |
Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the There is one more detail using two additional macros:
Also one more word to |
I will check in the actual full example code soon. |
Some more thoughts: while this is a lot of different files, we are trying to serve a pretty complex use-case scenario. Most of this stuff is pretty boiler plate and doesn't really use much advanced C++ stuff. It basically comes down to a bunch of full specializations. The particular nice thing this scheme does for us is that it decouples the actual implementation from, providing specializations for TPLs, from providing ETI specializations. All three things can be modified independently. Furthermore this scheme clearly separates which files are responsible for which part of the hierarchy. |
@mhoemmen @dsunder @hcedwar @srajama1 |
You can Thought (tbd): Should we have something in Kokkos core to canonicalize a template< class ViewType >
using canonical_view_of_const =
View< typename ViewType::const_data_type
, typename ViewType::layout
, typename ViewType::device_type
, typename ViewType::memory_traits > ; The |
@crtrott I like @hcedwar 's idea of adding some "canonicalize the View" type functions. I think the design makes sense, especially its ETI / TPL aspects. In particular, I think it's enough for us to specialize on whether some TPL is available. Very few users in practice want to swap different TPLs in and out at compile or run time. (They just want to know what's the fastest TPL to use on each platform.) I don't think it's worth complicating the design for this use case, which may only be of interest to the occasional computer science publication. We're a national lab; that should be at best a tertiary interest for us. This design is good for "node-global" kernels. What about single-team or single-thread kernels? Are we worried about potential inlining overhead at those lower levels? Also, what about asynchronous dispatch? This is relevant to design of the implementation layer's interface, because Views may need to stay managed as they enter the implementation layer. |
Regarding asynchronous dispatch: the internal view types are is function specific. So for asynchronous ones the internal views must be managed. |
By the way, it might be better to move this issue and #28 to Wiki. |
I mean, this issue was a nice guideline for me. It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history. |
@mndevec wrote:
That's a good idea. I think it would be best, then, to close this issue, but copy its contents into the wiki. How about that? |
Okay, I moved this topic to here: |
Ok I think I am finally close to make this ETI stuff work properly. There is some funky compiler stuff with regards to using extern template instantiations for classes, in particular if you want to allow instantiations of other types but I believe my solution is now fool proof ......
Furthermore I believe the file structure and naming etc needs some cleanup. In particular this focus on MultiVector which historically comes from Tpetra is confusing for standalone users.
Lets start with some requirements what we need to be able to do::
In order to do all this we came up with a design which has 3 functionality layers (I will go into details later):
Now I want to go through a couple of design aspects in the next posts.
The text was updated successfully, but these errors were encountered: