Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocation algorithms implementation #2

Closed
wants to merge 133 commits into from

Conversation

isidorostsa
Copy link
Owner

@isidorostsa isidorostsa commented Jun 8, 2023

This is a starting point for the implementation of the P1144 relocate_at and uninitialized_relocate algorithms for HPX. They will later be used as base primitives for parallelized versions.

dimitraka and others added 30 commits May 11, 2023 18:33
Tests were disabled in STEllAR-GROUP#5796
Kepler support is removed in CUDA 12 so it's time to remove it here as
well.
This reverts commit 9d5a4a8.
Still not working unfortunately (tested with gcc/12 and cuda 12.1)
As all rocm/5 modules are currently broken on the machine
…ization for the MPI parcelport.

Default is OFF, since it will slow down the MPI parcelport.
- flyby: resolve winsock2.h warnings on windows
- flyby: add CI builder for gcc/mingw on Windows
6256: Fix nvcc/gcc-10 (Octo-Tiger) compilation issue  r=hkaiser a=G-071

Using HPX 1.9.0, the Octo-Tiger compilation (using just CUDA, not Kokkos) with gcc+nvcc fails due to an internal nvcc error pointing to loop.hpp within HPX:

```
build/hpx/include/hpx/parallel/util/loop.hpp(216): internal error: assertion failed at: "cp_gen_be.c", line 22617 in gen_variable_decl                                                                                                                 

1 catastrophic error detected in the compilation of "/src/octotiger/src/unitiger/hydro_impl/flux_cuda_kernel.cpp".                                                                                                                                      
Compilation aborted.
nvcc error   : 'cudafe++' died due to signal 6
nvcc error   : 'cudafe++' core dumped
```

That is with using g++10 and CUDA 11.5.
While using newer versions of g++/CUDA fix this, it would be nice to not lose g++-10 support over this.

Interestingly, simply providing nvcc with more information about the type seems to help as well, hence this PR to resolve the issue without losing support for the older compiler.





Co-authored-by: Gregor Daiss <Gregor.Daiss+git@gmail.com>
6239: Fix `Optimizing HPX applications` page of Manual r=hkaiser a=dimitraka

The content of the tables used to overlap with the side bar of the documentation making it impossible to actually read the information. These extended tables were split to smaller tables which are now clear and readable. 

With this change there is no need to include anchors to the table rows. For each counter there is a separate table, so one can just use the reference to the table in order to refer to a specific counter.


Co-authored-by: dimitraka <kadimitra@ece.auth.gr>
- Clang fails to recognize the global delete operator which has 2 arguments the second one being std::size_t

Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
6258: Rewriting wait_some to circumvent data races causing hangs r=hkaiser a=hkaiser

- flyby: fixing possible race in latch

`@m-diers` Could you please try this patch to see if it fixes your issue?

Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
6257: Cmake Tests: Delete operator check for size_t arg r=hkaiser a=SAtacker

- Clang fails to recognize the global delete operator which has 2 arguments the second one being std::size_t
- Reference - https://godbolt.org/z/q1hMrfKjv
- Fixes STEllAR-GROUP#6246

Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>




Co-authored-by: Shreyas Atre <shreyasatre16@gmail.com>
Co-authored-by: Shreyas Atre <61797109+SAtacker@users.noreply.github.com>
6249: Implement the send immediate optimization for the MPI parcelport. r=hkaiser a=JiakunYan

Users can use the option `hpx.parcel.mpi.sendimm` to enable and disable it.

When HPX wants to send a parcel, the default behavior is to enqueue this parcel into the parcel queue of the target locality, dequeue all the parcels from that parcel queue, aggregate them, and pass them in one `parcel_buffer` structure. It will also try to get an existing connection from the connection cache. These two data structures are protected by locks and can be a source of thread contention.

With the send immediate optimization, the parcelport will bypass the connection cache and parcel queues and directly send the parcel.

This option is mainly for research purposes. The default is off since it will slow down the MPI parcelport in most cases.

Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
…options

- flyby: fixing application options in configuration files
G-071 and others added 15 commits June 27, 2023 13:00
6290: Adding user supplied on_finalize r=hkaiser a=hkaiser

This is needed for the hpxMP project

Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
…tion threshold from 128 to 8192

- Replace the lock-based tag provider with an atomic variable
- Make the header message size dynamic
6274: Improve the MPI parcelport and change the zero-copy threshold to 8192 r=hkaiser a=JiakunYan

Improve the MPI parcelport:
- Replace the lock-based tag provider with an atomic variable
- Make the header message size dynamic

Since the maximum size of the dynamic header message is set to be the zero-copy serialization threshold, I also changed it from 128B to 8192B. (The message size that MPI switches from eager protocol to rendezvous protocol is usually around 8K - 64K).

Experiments: Octo-Tiger, max_level=6, SDSC Expanse, 32 nodes, 128 threads per node.
With `hpx.parcel.mpi.sendimm=0`, this PR improves the total execution time from 11.77s to 10.68s.
With `hpx.parcel.mpi.sendimm=1`, this PR improves the total execution time from ~60s to 11.72s.

Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
@isidorostsa isidorostsa changed the title hpx::relocate_at algorithm implementation Relocation algorithms implementation Jul 5, 2023
namespace hpx {

#if defined(HPX_HAVE_P1144_STD_RELOCATE_AT)
using std::uninitialized_relocate; // what would be the include for this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In P1144, this algorithm appears in <memory>, alongside std::uninitialized_fill and std::destroy and so on.
P1144R8 does not specify any feature-test macro, e.g. #define __cpp_lib_relocate 202307L; but if that would help you, I'm open to suggestions on such a macro's name, and can add it to my libc++ implementation on Godbolt.


auto n_bytes = std::distance(first_byte, last_byte);

memmove(dest, first, n_bytes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is conceivable that we have InIter=int*, FwdIter=const int*. In that case, dest won't be convertible to void*; you'll need const_cast<void*>(static_cast<const void*>(dest)) or simply (void*)dest. (Remember, we're not modifying *dest; we're bringing it into existence. So it's sane for *dest to be a const type.)

I'm fixing the same bug in my libc++ code as we speak. :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, hadn't thought of that.

So it makes sense to modify relocate_at to accept different input/output types?

template <typename SourceType, typename DestType>
T* relocate_at(SourceType* src, DestType* dst)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do that in libc++ internally: implement relocate_at(T*, T*) in terms of __libcpp_relocate_at2(T*, U*). But I have never proposed that in P1144R0–R8, because relocate_at(T*, U*) can't ever take advantage of trivial relocatability unless T==U. (I mean, in theory, a sufficiently smart implementation could expose __is_trivially_relocatable_from(T, U); but nobody does that, and it's not at all clear what would be the syntax via which the user could "warrant" such a property themselves. See https://quuxplusone.github.io/blog/2018/07/03/trivially-constructible-from/ for more thoughts on the topic.)

Note that the "relocatable-from" relationship, if it existed, wouldn't be symmetrical:

    int *a;
    std::unique_ptr<int> b;
    std::construct_at(&b, std::move(a)); std::destroy_at(&a); // OK
    std::construct_at(&a, std::move(b)); std::destroy_at(&b); // Ill-formed

    static_assert(std::is_relocatable_from_v<std::unique_ptr<int>, int*>);
    static_assert(!std::is_relocatable_from_v<int*, std::unique_ptr<int>>);

...However, maybe you just meant "shouldn't it be possible to relocate from const into non-const?"—

    const int j = 42;
    alignas(int) char ibuf[sizeof(int)];
    int& i = *(int*)ibuf;
    std::relocate_at(&j, &i);  // P1144R8: Ill-formed, deduction failed: `const int` versus `int`

I'm much more sympathetic to that idea, but it's never yet come up in WG21 discussions. Until someone claims it's a real problem, I'd weakly prefer not to try to address it. I think addressing it will require either completely uncoupling SourceType and DestType as you suggest (which makes the API seem to promise triviality in cases where the implementation can't actually guarantee it), or doing some sort of awful constrained template like requires is_same_v<remove_cvref_t<SourceType>, remove_cvref_t<DestType>> (which makes the API harder to understand).

Comment on lines 28 to 29
using out_type =
std::decay_t<typename std::iterator_traits<FwdIter>::reference>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicks:

  • Perhaps object_type should be in_type, for symmetry.
  • The use of decay_t suggests to me that you should test what happens when InIter and/or FwdIter is a pointer-to-array, e.g. int(*)[5]. P1144R8 says that uninitialized_relocate should refuse to compile when FwdIter is a pointer-to-array (because the as-if-by expression ::new (result) T(std::move(*first)) would be ill-formed), but should work fine when InIter is a pointer-to-array (because the behavior is defined as-if-by std::destroy_at(std::addressof(*first)), not first->~T()). I'm fixing the same bug in my libc++ code as we speak. :)

Copy link
Owner Author

@isidorostsa isidorostsa Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it I am not sure decay is needed at all.

Would the following be sufficient instead?

using in_type = typename std::remove_reference_t<decltype(*std::declval<InIter>())>;
using out_type = typename std::remove_reference_t<decltype(*std::declval<FwdIter>())>;

That way pointer-to-array types become in_type = int[N] which we can test for inside uninitialized_relocate:

template <typename InIter, typename FwdIter>
    FwdIter uninitialized_relocate(InIter first, InIter last, FwdIter dest)
    {
        static_assert(
            hpx::is_relocatable_v<
                typename detail::choose_uninitialized_relocate_impl<InIter,
                    FwdIter>::in_type>,
            "Relocated objects must be move-constructible");
        return detail::uninitialized_relocate_helper(first, last, dest);
    }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be sufficient instead?

Yes, I think so.

But the second half of your comment is coupled to the question above about relocatable_from: In this algorithm we do support first and dest having different value_types (again because it makes the API simpler), so you would need to static-assert on is_relocatable_from_v<out_type, in_type>.

Copy link
Owner Author

@isidorostsa isidorostsa Jul 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice, could is_relocatable_from be as simple as

  constexpr bool is_relocatable_from_v =
      std::is_constructible_v<std::remove_cv_t<out_type>,
                              std::add_rvalue_reference_t<in_type>> &&
      !std::is_const_v<in_type>;

where in_type = decltype(*first), out_type = decltype(*dest) ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say just:

constexpr bool is_relocatable_from_v =
    std::is_constructible_v<std::remove_cv_t<out_type>, in_type>;

The add_rvalue_reference_t isn't needed because all type-traits that talk about invocability ("can I call X with Y arguments?") treat Y in the same way as declval<Y>(), i.e., apply a layer of rvalue-ref-ness automatically for no reason. (This is a priori awful, but it's been this way since C++11 and will never change.)

I would expect to have is_relocatable_from_v<unique_ptr<int>, const unique_ptr<int>> == false, but is_relocatable_from_v<shared_ptr<int>, const shared_ptr<int>> == true.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first part, does this weird convention apply to constructibility as well? I think this is a counterexample: https://godbolt.org/z/G36nYfeWv

For the second part, I agree with the first example, but I do not understand why is_relocatable_from_v<shared_ptr<int>, const shared_ptr<int>> == true would be correct, isn't relocating from a const object UB?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first part, does this weird convention apply to constructibility as well? I think this is a counterexample

You must have misinterpreted my meaning, because that's not a counterexample to what I meant; see https://godbolt.org/z/jrx9MMh8T . Basically the intuition is, any time a type-trait might plausibly be implemented as SFINAE on something-of(declval<Args>()...), then it literally is specified as-if-by SFINAE on something-of(declval<Args>()...), i.e., Args is interpreted as Args&&. Example: https://godbolt.org/z/43rPWTe6T

For the second part, [...] isn't relocating from a const object UB?

No, I don't see why it should be. Relocation is as-if-by construct_at(dst, std::move(*src)) (which is OK as long as you've got a copy constructor) followed by std::destroy_at(src) (which works fine on const objects). You're not allowed to change the value of a const object, but of course you're allowed to end its lifetime. If you couldn't end the lifetime of a const object then every const object would be immortal. :)

Copy link
Owner Author

@isidorostsa isidorostsa Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must have misinterpreted ....

Yes, you are completely right, thank you for the clarification

you're allowed to end its lifetime. If you couldn't end the lifetime of a const object then every const object would be immortal

I had never thought of it that way, thanks for this too. TIL

libs/core/type_support/tests/unit/relocate_at.cpp Outdated Show resolved Hide resolved
Comment on lines 76 to 97
/* ABI HACKY PART
template <class T,
typename std::enable_if_t<!hpx::is_trivially_relocatable_v<T> ||
std::is_trivially_destructible_v<T>,
int> = 0>
T relocate(T* source)
{
auto g = hpx::detail::destroy_guard(*source);
return HPX_MOVE(*source);
}

template <class T,
typename std::enable_if_t<hpx::is_trivially_relocatable_v<T> &&
!std::is_trivially_destructible_v<T>,
int> = 0>
T relocate(T* source)
{
auto magic = (T(*)(void*, size_t)) memcpy;
return magic(source, sizeof(T));
}
// more info this https://quuxplusone.github.io/blog/2022/05/18/std-relocate/
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, do you think there's a standard-friendly way to achieve the same thing?

The spec's reference implementation doesn't get you the memmove codegen, but it's perfectly conforming:

std::remove_cv_t<T> t = std::move(source);
std::destroy_at(source);
return t;

There is no way to get the memmove codegen without bending some rule or other: either use my hack, or, here's an alternative direction that's been discussed:

    __attribute__((do_not_construct)) T t;
    std::relocate_at(source, std::address_of(t));
    return t;  // NRVO

But no compiler vendor has ever implemented such an attribute. (I don't even think anyone's implemented P1247 __attribute__((no_destroy)).) So unless you own your own compiler, that direction is currently a dead end. :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad to hear, does P1144 require compilers to implement a way to get the memmove codegen, or is not required as per the as-if rule?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does P1144 require compilers to implement a way to get the memmove codegen

Nope. This is a deliberate design decision, both

  • pragmatically — I don't want vendors to have to jump through every single hoop on the first try; let them deal with performance as a quality-of-implementation issue at their leisure
  • philosophically — the implementation may skip observable side effects associated with trivial relocation; but if you're relying on the implementation to skip a side effect of relocation because you know that that side effect would change the semantics of your program, well then, it sounds like your type isn't trivially relocatable after all!

JiakunYan and others added 7 commits July 7, 2023 22:32
6292: update(lci pp): update the default version of LCI autofetch to v1.7.5 r=hkaiser a=JiakunYan

LCI v1.7.5 improves its flexibility and usability. It also fixes various minor compilation warnings and errors.

Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
Add TBB to HPX documentation in Migration Guide
6296: Fixing error message about use_guard_pages r=hkaiser a=hkaiser

Fixes STEllAR-GROUP#6288


Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
6264: hpx::is_trivially_relocatable trait implementation r=hkaiser a=isidorostsa

This PR deals with the implementation of the `is_trivially_relocatable` trait. It offers the user an interface to declare their custom class as `trivially_relocatable` using the macro:

`HPX_DECLARE_TRIVIALLY_RELOCATABLE(<class name>)`

This assumes that trivially_copyable types are also trivially relocatable, following Folly's convention: https://github.com/facebook/folly/blob/ec297b748575e8ab86333899295715e6e85f909d/folly/Traits.h#L542

Co-authored-by: isidorostsa <tsa.isidoros@gmail.com>
Co-authored-by: Isidoros <tsa.isidoros@gmail.com>
isidorostsa pushed a commit that referenced this pull request Jul 2, 2024
Compatibility of hwloc with the install command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants