NUMA-Aware 2d-stencil #5134

topkanoguzhan · 2021-01-28T19:08:45Z

topkanoguzhan
Jan 28, 2021

Hello everyone,

My name is Oguzhan and I wanted to ask for some advice on a little project I am working on.
It is similar in structure to the 4th stage of the 1d-stencil heat simulation from:
https://hpx-docs.stellar-group.org/latest/html/examples/1d_stencil.html

The space is 2 dimensional but imagine there are 3 grids. A, B, C, where A & B are
simulated over multiple iterations while C remains static and is involved in the
computations of A and B.

In every time step it:
A[it], B[it] and C are used to compute B[it+1]
B[it+1, A[it], and C are used to compute A[it+1]
For each point in the grid there are also neighboring elements required.
The grids are disaggregated similarly to the example in the documentions,
but the space is now a vector of partition_data futures that represent 2
dimensional partitions.

The for_each parallel algorithm is used to initialize the grids in parallel.
It iterates over each row of blocks, each iteration initializing that row's partitions of A, B, C.
And the tasks are generated in the order described above inside a loop iterating over each timestep.
This version is 10% faster than a naive OpenMP solution but a solution from another
task-parallel runtime system against which I try to compete achieves a
~25-30% faster runtime than OpenMP.

The question I had was how I could better achieve NUMA awareness in my solution.
My aim is to ensure on a 2-numa node setup to initialize the upper half of grids to
one node and the lower half to the other node so that I set up the tasks in a way
to limit the expensive memory loads from one numa node to the other
to the 2 middle rows of partitions of the grids where only vertical neighbours
need to be loaded.

Are there any tools or mechanisms available to achieve this?
I tried to look for something inside the GitHub Guide to HPX
but many examples were deleted.

Thank you very much in advance

hkaiser · 2021-01-28T20:17:57Z

hkaiser
Jan 28, 2021
Maintainer

We have 2d-stencil examples here: https://github.com/STEllAR-GROUP/tutorials. I also know, that @NK-Nikunj has worked on porting something similar to ARM architectures with wide vectorization. He might be able to elaborate.

8 replies

hkaiser Jan 28, 2021
Maintainer

This should be https://github.com/STEllAR-GROUP/tutorials/blob/master/examples/03_stencil/stencil_parallel_1.cpp instead, I believe.

topkanoguzhan Jan 28, 2021
Author

Thank you, this link works

hkaiser Jan 28, 2021
Maintainer

Would you be able to create a PR that fixes the broken links?

topkanoguzhan Jan 28, 2021
Author

You mean go to the files, fork and edit them with the working link? I can do that

NK-Nikunj Jan 28, 2021
Collaborator

Yes. It'll be nice to have a correction PR :)

NK-Nikunj · 2021-01-28T20:51:42Z

NK-Nikunj
Jan 28, 2021
Collaborator

@topkanoguzhan from what I understand, your partion_data stores a collection of data points in the form of std::vector (hence the requirement of NUMA aware allocators). If that's the case, I suggest you use HPX's inbuilt vector with the NUMA aware allocator.

The performance gap between non-NUMA aware and NUMA aware containers is significant (30-40%, sometimes even more) and you should be able to make up for the difference between HPX and the other runtime you mention. You can find NUMA aware implementation of 2D stencil codes here. Also, from the pattern of updates, I would not suggest explicitly vectorizing the codes (see this) as it won't give you more than 10-15% difference. If you still want to squeeze out that extra performance, you can look into my implementation of 2D-stencil with explicit vectorization. Initializing a different data layout followed by maintaining the halo regions can get tricky. And for the hours invested, the extra performance boost isn't really worth it.

4 replies

topkanoguzhan Jan 28, 2021
Author

Thank you for the thorough answer. Yes the partition_data objects hold the data elements. The partition_data objects themselves are organized in a vector of partition_data objects that represent the overall grid. Do I have to use the hpx vector for the vector of partition data or the elements of partition data?

And how do I ensure that the A[i,j], and B[i,j] partitions of the grids A and B are aligned?

NK-Nikunj Jan 28, 2021
Collaborator

You can use hpx vector for both. Since hpx vector is C++ standards conforming, you wouldn't need any other changes than to replace std:: with hpx::compute:: and adding an allocator as the 2nd template parameter.

I'm not sure what you mean by aligning grid. Could you elaborate?

Btw, you could also replace the partion_data based approach with a lightweight container that can provide you a 2nd space view. You could somewhat call it a less powerful mdspan or std::span over 2d arrays. For reference, you can check my grid implementation. It isn't the best implementation (no iterators :/), but it worked fairly well for my case.

topkanoguzhan Jan 28, 2021
Author

What I meant was that when allocating A and B the partitions that have the same "coordinates" are allocated to the same NUMA nodes because they require each other for computation.

NK-Nikunj Jan 28, 2021
Collaborator

Use an hpx for_all loop to initialize the data just like you used the for_all loop to update the values. The NUMA aware allocator is based on the first_touch policy. So as long as both your for_all loops use the same policy, things should work out of the box.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUMA-Aware 2d-stencil #5134

{{title}}

Replies: 2 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

NUMA-Aware 2d-stencil #5134

topkanoguzhan Jan 28, 2021

Replies: 2 comments · 12 replies

hkaiser Jan 28, 2021 Maintainer

hkaiser Jan 28, 2021 Maintainer

topkanoguzhan Jan 28, 2021 Author

hkaiser Jan 28, 2021 Maintainer

topkanoguzhan Jan 28, 2021 Author

NK-Nikunj Jan 28, 2021 Collaborator

NK-Nikunj Jan 28, 2021 Collaborator

topkanoguzhan Jan 28, 2021 Author

NK-Nikunj Jan 28, 2021 Collaborator

topkanoguzhan Jan 28, 2021 Author

NK-Nikunj Jan 28, 2021 Collaborator

topkanoguzhan
Jan 28, 2021

Replies: 2 comments 12 replies

hkaiser
Jan 28, 2021
Maintainer

hkaiser Jan 28, 2021
Maintainer

topkanoguzhan Jan 28, 2021
Author

hkaiser Jan 28, 2021
Maintainer

topkanoguzhan Jan 28, 2021
Author

NK-Nikunj Jan 28, 2021
Collaborator

NK-Nikunj
Jan 28, 2021
Collaborator

topkanoguzhan Jan 28, 2021
Author

NK-Nikunj Jan 28, 2021
Collaborator

topkanoguzhan Jan 28, 2021
Author

NK-Nikunj Jan 28, 2021
Collaborator