-
Notifications
You must be signed in to change notification settings - Fork 916
Description
Preamble
I am moving the discussion about SIMD that started in #716 here and adding hybrid parallelization.
The two topics go hand in hand since both (SPMD and SIMD) consist of processing multiple data (MD) elements simultaneous, either by a single program (SP) that is run by multiple threads (generally with shared view of memory), or by a single instruction (SI) run by a single core.
The reason SIMD came up in #716 is that, as I will demonstrate, vectorization needs to be supported by data structures. On the other hand SPMD needs to be supported by algorithms designed to avoid race conditions, two or more threads modifying the same memory location.
Instead of continuing #716 I think it is better to let that become documentation for #753.
I did not add without loss of readability to the title because it is long as is, that requirement is present nonetheless.
I open these issues in the hope that people participate (I am not a fast writer so this is actually a lot of work) and so far great comments and insights have come from those with experience in these topics (kudos to @economon and @vdweide). But please participate even if you never heard of these topics, your opinion about readability and "developability" of the code is important! I think the code-style should be accessible to people starting a PhD (after they read a bit about C++...).
My (ambitious) goal with this work is to lay down an architecture for performance, i.e. not just to improve the performance of a few key numerical schemes but to create mechanisms applicable to all existing and future ones. Moreover I want that to be possible with minimal changes to the way those bits of code are currently written.