-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relocation algorithms implementation #2
Conversation
Tests were disabled in STEllAR-GROUP#5796
Kepler support is removed in CUDA 12 so it's time to remove it here as well.
This reverts commit 9d5a4a8. Still not working unfortunately (tested with gcc/12 and cuda 12.1)
As all rocm/5 modules are currently broken on the machine
…ization for the MPI parcelport. Default is OFF, since it will slow down the MPI parcelport.
- flyby: resolve winsock2.h warnings on windows - flyby: add CI builder for gcc/mingw on Windows
This reverts commit 71c8ac3.
6256: Fix nvcc/gcc-10 (Octo-Tiger) compilation issue r=hkaiser a=G-071 Using HPX 1.9.0, the Octo-Tiger compilation (using just CUDA, not Kokkos) with gcc+nvcc fails due to an internal nvcc error pointing to loop.hpp within HPX: ``` build/hpx/include/hpx/parallel/util/loop.hpp(216): internal error: assertion failed at: "cp_gen_be.c", line 22617 in gen_variable_decl 1 catastrophic error detected in the compilation of "/src/octotiger/src/unitiger/hydro_impl/flux_cuda_kernel.cpp". Compilation aborted. nvcc error : 'cudafe++' died due to signal 6 nvcc error : 'cudafe++' core dumped ``` That is with using g++10 and CUDA 11.5. While using newer versions of g++/CUDA fix this, it would be nice to not lose g++-10 support over this. Interestingly, simply providing nvcc with more information about the type seems to help as well, hence this PR to resolve the issue without losing support for the older compiler. Co-authored-by: Gregor Daiss <Gregor.Daiss+git@gmail.com>
6239: Fix `Optimizing HPX applications` page of Manual r=hkaiser a=dimitraka The content of the tables used to overlap with the side bar of the documentation making it impossible to actually read the information. These extended tables were split to smaller tables which are now clear and readable. With this change there is no need to include anchors to the table rows. For each counter there is a separate table, so one can just use the reference to the table in order to refer to a specific counter. Co-authored-by: dimitraka <kadimitra@ece.auth.gr>
- Clang fails to recognize the global delete operator which has 2 arguments the second one being std::size_t Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
6258: Rewriting wait_some to circumvent data races causing hangs r=hkaiser a=hkaiser - flyby: fixing possible race in latch `@m-diers` Could you please try this patch to see if it fixes your issue? Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com>
6257: Cmake Tests: Delete operator check for size_t arg r=hkaiser a=SAtacker - Clang fails to recognize the global delete operator which has 2 arguments the second one being std::size_t - Reference - https://godbolt.org/z/q1hMrfKjv - Fixes STEllAR-GROUP#6246 Signed-off-by: Shreyas Atre <shreyasatre16@gmail.com> Co-authored-by: Shreyas Atre <shreyasatre16@gmail.com> Co-authored-by: Shreyas Atre <61797109+SAtacker@users.noreply.github.com>
This reverts commit 087074a.
6249: Implement the send immediate optimization for the MPI parcelport. r=hkaiser a=JiakunYan Users can use the option `hpx.parcel.mpi.sendimm` to enable and disable it. When HPX wants to send a parcel, the default behavior is to enqueue this parcel into the parcel queue of the target locality, dequeue all the parcels from that parcel queue, aggregate them, and pass them in one `parcel_buffer` structure. It will also try to get an existing connection from the connection cache. These two data structures are protected by locks and can be a source of thread contention. With the send immediate optimization, the parcelport will bypass the connection cache and parcel queues and directly send the parcel. This option is mainly for research purposes. The default is off since it will slow down the MPI parcelport in most cases. Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
…options - flyby: fixing application options in configuration files
6290: Adding user supplied on_finalize r=hkaiser a=hkaiser This is needed for the hpxMP project Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
…tion threshold from 128 to 8192 - Replace the lock-based tag provider with an atomic variable - Make the header message size dynamic
6274: Improve the MPI parcelport and change the zero-copy threshold to 8192 r=hkaiser a=JiakunYan Improve the MPI parcelport: - Replace the lock-based tag provider with an atomic variable - Make the header message size dynamic Since the maximum size of the dynamic header message is set to be the zero-copy serialization threshold, I also changed it from 128B to 8192B. (The message size that MPI switches from eager protocol to rendezvous protocol is usually around 8K - 64K). Experiments: Octo-Tiger, max_level=6, SDSC Expanse, 32 nodes, 128 threads per node. With `hpx.parcel.mpi.sendimm=0`, this PR improves the total execution time from 11.77s to 10.68s. With `hpx.parcel.mpi.sendimm=1`, this PR improves the total execution time from ~60s to 11.72s. Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
ee54325
to
daf4737
Compare
namespace hpx { | ||
|
||
#if defined(HPX_HAVE_P1144_STD_RELOCATE_AT) | ||
using std::uninitialized_relocate; // what would be the include for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In P1144, this algorithm appears in <memory>
, alongside std::uninitialized_fill
and std::destroy
and so on.
P1144R8 does not specify any feature-test macro, e.g. #define __cpp_lib_relocate 202307L
; but if that would help you, I'm open to suggestions on such a macro's name, and can add it to my libc++ implementation on Godbolt.
|
||
auto n_bytes = std::distance(first_byte, last_byte); | ||
|
||
memmove(dest, first, n_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is conceivable that we have InIter=int*, FwdIter=const int*
. In that case, dest
won't be convertible to void*
; you'll need const_cast<void*>(static_cast<const void*>(dest))
or simply (void*)dest
. (Remember, we're not modifying *dest
; we're bringing it into existence. So it's sane for *dest
to be a const type.)
I'm fixing the same bug in my libc++ code as we speak. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, hadn't thought of that.
So it makes sense to modify relocate_at
to accept different input/output types?
template <typename SourceType, typename DestType>
T* relocate_at(SourceType* src, DestType* dst)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do that in libc++ internally: implement relocate_at(T*, T*)
in terms of __libcpp_relocate_at2(T*, U*)
. But I have never proposed that in P1144R0–R8, because relocate_at(T*, U*)
can't ever take advantage of trivial relocatability unless T==U
. (I mean, in theory, a sufficiently smart implementation could expose __is_trivially_relocatable_from(T, U)
; but nobody does that, and it's not at all clear what would be the syntax via which the user could "warrant" such a property themselves. See https://quuxplusone.github.io/blog/2018/07/03/trivially-constructible-from/ for more thoughts on the topic.)
Note that the "relocatable-from" relationship, if it existed, wouldn't be symmetrical:
int *a;
std::unique_ptr<int> b;
std::construct_at(&b, std::move(a)); std::destroy_at(&a); // OK
std::construct_at(&a, std::move(b)); std::destroy_at(&b); // Ill-formed
static_assert(std::is_relocatable_from_v<std::unique_ptr<int>, int*>);
static_assert(!std::is_relocatable_from_v<int*, std::unique_ptr<int>>);
...However, maybe you just meant "shouldn't it be possible to relocate from const into non-const?"—
const int j = 42;
alignas(int) char ibuf[sizeof(int)];
int& i = *(int*)ibuf;
std::relocate_at(&j, &i); // P1144R8: Ill-formed, deduction failed: `const int` versus `int`
I'm much more sympathetic to that idea, but it's never yet come up in WG21 discussions. Until someone claims it's a real problem, I'd weakly prefer not to try to address it. I think addressing it will require either completely uncoupling SourceType
and DestType
as you suggest (which makes the API seem to promise triviality in cases where the implementation can't actually guarantee it), or doing some sort of awful constrained template like requires is_same_v<remove_cvref_t<SourceType>, remove_cvref_t<DestType>>
(which makes the API harder to understand).
using out_type = | ||
std::decay_t<typename std::iterator_traits<FwdIter>::reference>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicks:
- Perhaps
object_type
should bein_type
, for symmetry. - The use of
decay_t
suggests to me that you should test what happens whenInIter
and/orFwdIter
is a pointer-to-array, e.g.int(*)[5]
. P1144R8 says thatuninitialized_relocate
should refuse to compile whenFwdIter
is a pointer-to-array (because the as-if-by expression::new (result) T(std::move(*first))
would be ill-formed), but should work fine whenInIter
is a pointer-to-array (because the behavior is defined as-if-bystd::destroy_at(std::addressof(*first))
, notfirst->~T()
). I'm fixing the same bug in my libc++ code as we speak. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you mention it I am not sure decay is needed at all.
Would the following be sufficient instead?
using in_type = typename std::remove_reference_t<decltype(*std::declval<InIter>())>;
using out_type = typename std::remove_reference_t<decltype(*std::declval<FwdIter>())>;
That way pointer-to-array types become in_type = int[N]
which we can test for inside uninitialized_relocate
:
template <typename InIter, typename FwdIter>
FwdIter uninitialized_relocate(InIter first, InIter last, FwdIter dest)
{
static_assert(
hpx::is_relocatable_v<
typename detail::choose_uninitialized_relocate_impl<InIter,
FwdIter>::in_type>,
"Relocated objects must be move-constructible");
return detail::uninitialized_relocate_helper(first, last, dest);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following be sufficient instead?
Yes, I think so.
But the second half of your comment is coupled to the question above about relocatable_from
: In this algorithm we do support first
and dest
having different value_type
s (again because it makes the API simpler), so you would need to static-assert on is_relocatable_from_v<out_type, in_type>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice, could is_relocatable_from be as simple as
constexpr bool is_relocatable_from_v =
std::is_constructible_v<std::remove_cv_t<out_type>,
std::add_rvalue_reference_t<in_type>> &&
!std::is_const_v<in_type>;
where in_type = decltype(*first)
, out_type = decltype(*dest)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say just:
constexpr bool is_relocatable_from_v =
std::is_constructible_v<std::remove_cv_t<out_type>, in_type>;
The add_rvalue_reference_t
isn't needed because all type-traits that talk about invocability ("can I call X with Y arguments?") treat Y in the same way as declval<Y>()
, i.e., apply a layer of rvalue-ref-ness automatically for no reason. (This is a priori awful, but it's been this way since C++11 and will never change.)
I would expect to have is_relocatable_from_v<unique_ptr<int>, const unique_ptr<int>> == false
, but is_relocatable_from_v<shared_ptr<int>, const shared_ptr<int>> == true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first part, does this weird convention apply to constructibility as well? I think this is a counterexample: https://godbolt.org/z/G36nYfeWv
For the second part, I agree with the first example, but I do not understand why is_relocatable_from_v<shared_ptr<int>, const shared_ptr<int>> == true
would be correct, isn't relocating from a const object UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first part, does this weird convention apply to constructibility as well? I think this is a counterexample
You must have misinterpreted my meaning, because that's not a counterexample to what I meant; see https://godbolt.org/z/jrx9MMh8T . Basically the intuition is, any time a type-trait might plausibly be implemented as SFINAE on something-of(declval<Args>()...)
, then it literally is specified as-if-by SFINAE on something-of(declval<Args>()...)
, i.e., Args
is interpreted as Args&&
. Example: https://godbolt.org/z/43rPWTe6T
For the second part, [...] isn't relocating from a const object UB?
No, I don't see why it should be. Relocation is as-if-by construct_at(dst, std::move(*src))
(which is OK as long as you've got a copy constructor) followed by std::destroy_at(src)
(which works fine on const objects). You're not allowed to change the value of a const object, but of course you're allowed to end its lifetime. If you couldn't end the lifetime of a const object then every const object would be immortal. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must have misinterpreted ....
Yes, you are completely right, thank you for the clarification
you're allowed to end its lifetime. If you couldn't end the lifetime of a const object then every const object would be immortal
I had never thought of it that way, thanks for this too. TIL
libs/core/type_support/include/hpx/type_support/relocate_at.hpp
Outdated
Show resolved
Hide resolved
/* ABI HACKY PART | ||
template <class T, | ||
typename std::enable_if_t<!hpx::is_trivially_relocatable_v<T> || | ||
std::is_trivially_destructible_v<T>, | ||
int> = 0> | ||
T relocate(T* source) | ||
{ | ||
auto g = hpx::detail::destroy_guard(*source); | ||
return HPX_MOVE(*source); | ||
} | ||
|
||
template <class T, | ||
typename std::enable_if_t<hpx::is_trivially_relocatable_v<T> && | ||
!std::is_trivially_destructible_v<T>, | ||
int> = 0> | ||
T relocate(T* source) | ||
{ | ||
auto magic = (T(*)(void*, size_t)) memcpy; | ||
return magic(source, sizeof(T)); | ||
} | ||
// more info this https://quuxplusone.github.io/blog/2022/05/18/std-relocate/ | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, do you think there's a standard-friendly way to achieve the same thing?
The spec's reference implementation doesn't get you the memmove codegen, but it's perfectly conforming:
std::remove_cv_t<T> t = std::move(source);
std::destroy_at(source);
return t;
There is no way to get the memmove codegen without bending some rule or other: either use my hack, or, here's an alternative direction that's been discussed:
__attribute__((do_not_construct)) T t;
std::relocate_at(source, std::address_of(t));
return t; // NRVO
But no compiler vendor has ever implemented such an attribute. (I don't even think anyone's implemented P1247 __attribute__((no_destroy))
.) So unless you own your own compiler, that direction is currently a dead end. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad to hear, does P1144 require compilers to implement a way to get the memmove codegen, or is not required as per the as-if rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does P1144 require compilers to implement a way to get the memmove codegen
Nope. This is a deliberate design decision, both
- pragmatically — I don't want vendors to have to jump through every single hoop on the first try; let them deal with performance as a quality-of-implementation issue at their leisure
- philosophically — the implementation may skip observable side effects associated with trivial relocation; but if you're relying on the implementation to skip a side effect of relocation because you know that that side effect would change the semantics of your program, well then, it sounds like your type isn't trivially relocatable after all!
6292: update(lci pp): update the default version of LCI autofetch to v1.7.5 r=hkaiser a=JiakunYan LCI v1.7.5 improves its flexibility and usability. It also fixes various minor compilation warnings and errors. Co-authored-by: Jiakun Yan <jiakunyan1998@gmail.com>
Add CUDA/HIP MultiGPU Event Polling
Add TBB to HPX documentation in Migration Guide
6296: Fixing error message about use_guard_pages r=hkaiser a=hkaiser Fixes STEllAR-GROUP#6288 Co-authored-by: Hartmut Kaiser <hartmut.kaiser@gmail.com>
6264: hpx::is_trivially_relocatable trait implementation r=hkaiser a=isidorostsa This PR deals with the implementation of the `is_trivially_relocatable` trait. It offers the user an interface to declare their custom class as `trivially_relocatable` using the macro: `HPX_DECLARE_TRIVIALLY_RELOCATABLE(<class name>)` This assumes that trivially_copyable types are also trivially relocatable, following Folly's convention: https://github.com/facebook/folly/blob/ec297b748575e8ab86333899295715e6e85f909d/folly/Traits.h#L542 Co-authored-by: isidorostsa <tsa.isidoros@gmail.com> Co-authored-by: Isidoros <tsa.isidoros@gmail.com>
23ca4d2
to
8b4f11d
Compare
Compatibility of hwloc with the install command
This is a starting point for the implementation of the P1144
relocate_at
anduninitialized_relocate
algorithms for HPX. They will later be used as base primitives for parallelized versions.