Decrease time for world.gop.reduce #547

EricaCMitchell · 2024-09-12T20:33:40Z

Decreases time for world.gop.reduce by shortening the buffer length to the minimum number of elements and only forming the buffers if the child process exists. Significantly, speeds up sum reduction for std::complex type.
@evaleev

JonathonMisiewicz · 2024-09-13T13:31:23Z

src/madness/world/worldgop.h

-          auto buf0 = std::unique_ptr<T[]>(new T[nelem_per_maxmsg]);
-          auto buf1 = std::unique_ptr<T[]>(new T[nelem_per_maxmsg]);
+          std::unique_ptr<T[]> buf0 = (child0 != -1)
+                ? std::unique_ptr<T[]>(new T[std::min(nelem_per_maxmsg,nelem)])


Does std::make_unique not work here? The less new, the better.

I'll test timing with make_unique, when I ran tests with it by just switching the original code it resulted in longer test times in MPQC.

That's annoying. For future reference, if there is a slowdown, there's a good chance this is the cause. Not that we can do anything about that as long as we maintain C++17 support.

Haven't seen a difference in the timings between the two in the MPQC test cases. I'll push the std::make_unique implementation.

I would avoid std::make_unique due to the issues discussed in Jonathon's link ... no reason to pay for value initialization of primitive types.

In fact, since this function only makes sense when T has trivial default ctor and trivial copy ctor (otherwise just copying bytes from MPI will not produce valid T objects) then we should switch to aligned malloc ... This will avoid us having to pay for initialization in the case of T=std::complex.

This should now be accomplished with std::malloc. I tried std::aligned_alloc but the alignment constraint using alignof(T) resulted in a nullptr return when child process was present.

I tried std::aligned_alloc but the alignment constraint using alignof(T) resulted in a nullptr return when child process was present.

possibly due to asking for size that is not a multiple of alignment? See: https://en.cppreference.com/w/cpp/memory/c/aligned_alloc#Notes

Double-checked, that doesn't seem to be the issue. From the code below and running h2-sci-x2c-1root-np2.json

std::cout << "Alignment: " << alignof(T) << " \n" << "Test: " << (sizeof(T) * std::min(nelem_per_maxmsg, nelem)) % alignof(T) << std::endl;

the output shows that the size is always a multiple of alignment.

the problem with the current solution is that alignment of T is not guaranteed ... I pushed changes to reduce and concat0 that hopefully address it, but not tested, please test, report

Using the most recent changes, the test set of sci-x2c in MPQC is decreased from 27.2s to 13.9s compared to the unaligned malloc.

…at0}

JonathonMisiewicz reviewed Sep 13, 2024

View reviewed changes

JonathonMisiewicz requested a review from evaleev September 13, 2024 13:32

EricaCMitchell force-pushed the pr-world_reduce branch from 92aea42 to 30ce5ea Compare September 13, 2024 16:56

evaleev force-pushed the pr-world_reduce branch from 1bf0405 to 025f80c Compare September 16, 2024 19:33

EricaCMitchell and others added 5 commits September 16, 2024 17:18

decrease buffer length and only form if child processes are present

2e8e080

Use std:malloc to avoid initialization

bfe28e3

handle alignment requirements correctly (?) in worldgop::{reduce,conc…

c7e2673

…at0}

Remove assertion and error msg, give type for ptr in concat0

0386b7f

[ci skip] semicolon and linewidth

430e153

EricaCMitchell force-pushed the pr-world_reduce branch from c4112e5 to 430e153 Compare September 16, 2024 21:19

Spacing

ef73ffd

evaleev approved these changes Sep 16, 2024

View reviewed changes

evaleev merged commit 0d87255 into m-a-d-n-e-s-s:master Sep 16, 2024
8 of 12 checks passed

evaleev added a commit to ValeevGroup/tiledarray that referenced this pull request Sep 17, 2024

bump MAD tag to pull in m-a-d-n-e-s-s/madness#547

f85c206

evaleev added a commit to ValeevGroup/tiledarray that referenced this pull request Sep 17, 2024

bump MAD tag to pull in m-a-d-n-e-s-s/madness#547

62a32c0

evaleev added a commit to ValeevGroup/tiledarray that referenced this pull request Sep 17, 2024

bump MAD tag to pull in m-a-d-n-e-s-s/madness#547

932fc68

evaleev added a commit to ValeevGroup/tiledarray that referenced this pull request Sep 17, 2024

bump MAD tag to pull in m-a-d-n-e-s-s/madness#547

d622383

bimalgaudel pushed a commit to ValeevGroup/tiledarray that referenced this pull request Sep 25, 2024

bump MAD tag to pull in m-a-d-n-e-s-s/madness#547

6429585

EricaCMitchell deleted the pr-world_reduce branch October 22, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease time for world.gop.reduce #547

Decrease time for world.gop.reduce #547

EricaCMitchell commented Sep 12, 2024 •

edited

Loading

JonathonMisiewicz Sep 13, 2024

EricaCMitchell Sep 13, 2024

JonathonMisiewicz Sep 13, 2024

EricaCMitchell Sep 13, 2024

evaleev Sep 13, 2024

EricaCMitchell Sep 14, 2024

evaleev Sep 14, 2024

EricaCMitchell Sep 16, 2024

evaleev Sep 16, 2024

EricaCMitchell Sep 16, 2024

Decrease time for world.gop.reduce #547

Decrease time for world.gop.reduce #547

Conversation

EricaCMitchell commented Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricaCMitchell commented Sep 12, 2024 •

edited

Loading