ETI System and file structure #31

crtrott · 2017-05-21T23:42:39Z

Ok I think I am finally close to make this ETI stuff work properly. There is some funky compiler stuff with regards to using extern template instantiations for classes, in particular if you want to allow instantiations of other types but I believe my solution is now fool proof ......

Furthermore I believe the file structure and naming etc needs some cleanup. In particular this focus on MultiVector which historically comes from Tpetra is confusing for standalone users.

Lets start with some requirements what we need to be able to do::

pre-compile functions, and prevent them from being implicitly instantiated (ETI)
Even with ETI on, allow other input types (say for example extended precision, or nonstandard data layouts)
call TPLs (MKL, CUBLAS etc.) for input types which allow it
disallow anything other than ETI types if requested
check what type of instantiation gets hit in apps (ETI, Non-ETI, TPL)

In order to do all this we came up with a design which has 3 functionality layers (I will go into details later):

User Interface: void foo(ViewType a, Scalar alpha): takes views accepts all kinds of combinations; calls the specialization layer
Specialization Layer: struct Foo { static void foo(ViewInternalType a, Scalar alpha); }; makes sure that only the minimally necessary number of instantiations exists, serves as ETI specialization layer, serves as TPL specialization layer
Implementation Layer: This is called by the specialization layer, and has the actual functors etc.

Now I want to go through a couple of design aspects in the next posts.

crtrott · 2017-05-22T03:55:49Z

I thought now long and hard about a sane, sustainable way for organizing this into files. Here is what I came up with:

src:
  KokkosBlas.hpp: includes all the KokkosBlas function header files
      KokkosBlas1_foo.hpp (contains user interface functions for foo)
src/impl:
  KokkosBlas1_foo_impl.hpp: The actual implementation of the functions (Functors etc.)
  KokkosBlas1_foo_spec.hpp: The specialization layer
src/impl/tpl
  KokkosBlas1_foo_tpl_spec_avail.hpp: Availability of TPLs for particular types
  KokkosBlas1_foo_tpl_spec_decl.hpp: The Specialization declaration for using tuples
src/impl/generated_specializations_hpp
  KokkosBlas1_foo_eti_spec_avail.hpp: Availability declarations for ETI types
  KokkosBlas1_foo_eti_spec_decl.hpp: Specialization declarations for ETI types
src/impl/generated_specializations_cpp/foo
  KokkosBlas1_foo_eti_spec_inst_double_LayoutRight_Cuda_CudaSpace.cpp: one instantiation for an extern template

Lets talk about what you need to touch to do specific things:

Add a new function:

Add all those files based on the template provided later
Modify the scripts which generate the auto generated files

Modify the implementation of a function

Only src/impl/KokkosBlas1_foo_impl.hpp needs to be modified

Add a new ETI type

modify the scripts which generate the auto generated files

Add a new TPL variant

Modify the files in impl/tpl/ to add the new TPL (declare its availability, and provide the implementation of how to call it)

crtrott · 2017-05-22T04:04:48Z

Lets look at the code and what those things do.

Public API in `src/KokkosBlas1_foo.hpp`

This file provides the public API for the function foo. The function internally calls the specialization layer after explicitly filling in all the necessary template arguments for the ViewTypes etc. For example for a dot(a,b) product, const modifiers should be added to the scalar type, if they are not already there. Otherwise this would require to compile the code potentially 4 times:

dot(View<double*>, View<double*>);
dot(View<double*>, View<const double*>);
dot(View<const double*>, View<double*>);
dot(View<const double*>, View<const double*>);
If you then factor in explicit vs implicit specification of Layout, Memory Space, and MemoryTraits we end up with over 100 possible instantiations for something which is technically the exact same thing!

Furthermore this function should also do static asserts on things which are not allowed (for example wrong Rank of the view) in order to give users an early exit in a function which they can directly associate with the code they written.

Here is an example for:

// Include the specialziation layer which define the Impl::Foo struct
#include<impl/KokkosBlas1_foo_spec.hpp>

namespace KokkosBlas1 {
// User facing function accepts any ViewType
template<class ViewType>
void foo(const ViewType& a) {

  // Static assert on prohibited types
  static_assert(ViewType::rank==1, "Trying to call foo with View of rank other than 1");

  // Convert ViewType to internal ViewType to reduce instantiations
  // Without this wether you explicitly specify a Layout or not would be 
  // two different instantiations since Views have variadic template parameters
  // Furthermore this is the place to add missing const etc.
  typedef Kokkos::View<typename ViewType::data_type,
                       typename ViewType::array_layout,
                       typename ViewType::device_type>
          ViewTypeInternal;

  // Call the actual implementation
  Impl::Foo<ViewTypeInternal>::foo(a);
}
}

crtrott · 2017-05-22T04:22:23Z

Next up:

The Specialization Layer

This layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI).

Generally this layer is very thin again and basically just passes through arguments.

The basic mechanism for ETI is the extern template mechanism of C++11. Unfortunately that thing has some funky semantics with respect to classes. In particular it looks like the compile can still choose to inline the implementation of the class, if it is visible in the same compilation unit instead of calling the externally available instantiation. This might also be compiler dependent.

To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available:

From impl/KokkosBlas1_foo_spec.hpp:

template<class ViewType>
struct foo_eti_spec_avail {
  enum : bool { value = false };
};

template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
                         bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
  static void foo(const ViewType& a);
};

In order to declare a specialization available a full specialization of foo_tpl_spec_avail or foo_eti_spec_avail must be made available. Those functions live in impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp and impl/generated_specializations_hpp/KokkosBlas1_foo_eti_spec_avail.hpp respectively with the latter auto generated. We come back to those files in a bit.

The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in impl/KokkosBlas1_foo_impl.hpp
Note that the TPL bool is set to false, while the other one is set to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. The latter one is only going to be true while compiling the KokkosKernels library with its explicit template instantiations.

template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
  static void foo(const ViewType& a) {
    execute_foo(a);
  }
};

In this file we also need to define the macros which are later used in the auto generated files:

// Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
  enum : bool { value = true }; \
}; 

// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif

Note how the actual declarations of those classes are only included when we are NOT compiling the library.

I'll post the whole file later after discussing some more Macro stuff.

crtrott · 2017-05-22T04:25:11Z

The implementation layer in impl/KokkosBlas1_foo_impl.hpp is pretty much whatever we need it to be. In this case its just a simple function:

  template<class ViewType>
  void execute_foo(const ViewType& a) {
    Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
      a(i) = i;
    });
  }

If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places.

crtrott · 2017-05-22T04:30:30Z

The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp:

template<class ViewType>
struct foo_tpl_spec_avail {
  enum : bool { value = false };
};

#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
  enum : bool { value = true };
};
#endif

Basically for every new TPL which we want to support we drop another full specialization of this stuff in.

The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not.

#ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {

// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    #if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
    printf("Calling MKL Specialization\n");
    #endif
    mkl_foo(a.data(),a.extent(0));
  }
};

// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    // Our code is better for large number of entries, so only use TPL for small lengths
    if(a.extent(0) < 100000)
      Foo<ViewType,true,false>::foo(a);
    else
      Foo<ViewType,false,true>::foo(a);
  }
};
}
}
#endif

crtrott · 2017-05-22T04:34:51Z

Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the extern template declaration of those ETI specializations, and instantiate them in cpp files. Those simply use the previously defined macros with the right type combinations.

There is one more detail using two additional macros:

KOKKOSKERNELS_ENABLE_ETI_ONLY: is used to prevent instantiations of Non-ETI or Non-TPL types. This is used to hide the actual definition of the specialization layer when not compiling the library cpp files.
KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION: this is more of a debug option which enables print statements stating which specialization (ETI, Non-ETI, TPL) was called. This is useful to make sure we don't instantiate stuff in cases where we can't turn on full ETI_ONLY.

Also one more word to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. This macro is always defined as false, except inside the auto generated ETI cpp files.

crtrott · 2017-05-22T04:41:51Z

I will check in the actual full example code soon.

crtrott · 2017-05-22T04:47:05Z

Some more thoughts: while this is a lot of different files, we are trying to serve a pretty complex use-case scenario. Most of this stuff is pretty boiler plate and doesn't really use much advanced C++ stuff. It basically comes down to a bunch of full specializations. The particular nice thing this scheme does for us is that it decouples the actual implementation from, providing specializations for TPLs, from providing ETI specializations. All three things can be modified independently. Furthermore this scheme clearly separates which files are responsible for which part of the hierarchy.

crtrott · 2017-05-22T04:49:42Z

@mhoemmen @dsunder @hcedwar @srajama1
Most folks on KokkosKernels are not that much interested in software engineering as long as what they have to work with works. But maybe you guys wanna take a look and tell me what you think (and also if the explanation makes sense why this is the design I came up with).

hcedwar · 2017-05-22T16:13:24Z

You can static_assert( is_view<T>::value , ... as well

Thought (tbd): Should we have something in Kokkos core to canonicalize a View?

template< class ViewType >
using canonical_view_of_const = 
  View< typename ViewType::const_data_type 
          , typename ViewType::layout 
          , typename ViewType::device_type 
          , typename ViewType::memory_traits > ;

The foo_eti_spec_avail and foo_tpl_spec_avail is an unfortunate need and, at first glance, a good minimalist approach.

mhoemmen · 2017-05-22T17:04:51Z

@crtrott I like @hcedwar 's idea of adding some "canonicalize the View" type functions.

I think the design makes sense, especially its ETI / TPL aspects. In particular, I think it's enough for us to specialize on whether some TPL is available. Very few users in practice want to swap different TPLs in and out at compile or run time. (They just want to know what's the fastest TPL to use on each platform.) I don't think it's worth complicating the design for this use case, which may only be of interest to the occasional computer science publication. We're a national lab; that should be at best a tertiary interest for us.

This design is good for "node-global" kernels. What about single-team or single-thread kernels? Are we worried about potential inlining overhead at those lower levels?

Also, what about asynchronous dispatch? This is relevant to design of the implementation layer's interface, because Views may need to stay managed as they enter the implementation layer.

crtrott · 2017-05-22T17:49:48Z

Regarding asynchronous dispatch: the internal view types are is function specific. So for asynchronous ones the internal views must be managed.

mndevec · 2017-08-14T23:50:43Z

By the way, it might be better to move this issue and #28 to Wiki.

mhoemmen · 2017-08-15T04:07:36Z

@mndevec I would say, @crtrott finished implementing the first-pass (more accurately, second-pass, or third-pass if you count Chris Baker's Tpetra kernels) design. Thus, it is my view that it would be proper to close this issue. We can always open new issues for new things to do.

mndevec · 2017-08-15T16:33:15Z

I mean, this issue was a nice guideline for me. It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

mhoemmen · 2017-08-15T16:55:21Z

@mndevec wrote:

It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

That's a good idea. I think it would be best, then, to close this issue, but copy its contents into the wiki. How about that?

mndevec · 2017-08-17T18:43:06Z

Okay, I moved this topic to here:
https://github.com/kokkos/kokkos-kernels/wiki/ETI-System-and-file-structure

mndevec mentioned this issue Aug 1, 2017

KokkosSparse Updates #40

Merged

mndevec closed this as completed Aug 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETI System and file structure #31

ETI System and file structure #31

crtrott commented May 21, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017

crtrott commented May 22, 2017

crtrott commented May 22, 2017

hcedwar commented May 22, 2017

mhoemmen commented May 22, 2017

crtrott commented May 22, 2017

mndevec commented Aug 14, 2017 •

edited

Loading

mhoemmen commented Aug 15, 2017 •

edited

Loading

mndevec commented Aug 15, 2017

mhoemmen commented Aug 15, 2017

mndevec commented Aug 17, 2017

ETI System and file structure #31

ETI System and file structure #31

Comments

crtrott commented May 21, 2017 • edited Loading

crtrott commented May 22, 2017 • edited Loading

Add a new function:

Modify the implementation of a function

Add a new ETI type

Add a new TPL variant

crtrott commented May 22, 2017 • edited Loading

Public API in src/KokkosBlas1_foo.hpp

crtrott commented May 22, 2017 • edited Loading

The Specialization Layer

crtrott commented May 22, 2017 • edited Loading

crtrott commented May 22, 2017

crtrott commented May 22, 2017 • edited Loading

crtrott commented May 22, 2017

crtrott commented May 22, 2017

crtrott commented May 22, 2017

hcedwar commented May 22, 2017

mhoemmen commented May 22, 2017

crtrott commented May 22, 2017

mndevec commented Aug 14, 2017 • edited Loading

mhoemmen commented Aug 15, 2017 • edited Loading

mndevec commented Aug 15, 2017

mhoemmen commented Aug 15, 2017

mndevec commented Aug 17, 2017

crtrott commented May 21, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

Public API in `src/KokkosBlas1_foo.hpp`

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

crtrott commented May 22, 2017 •

edited

Loading

mndevec commented Aug 14, 2017 •

edited

Loading

mhoemmen commented Aug 15, 2017 •

edited

Loading