records and move initialization across locales #15676

mppf · 2020-05-14T12:41:11Z

Sometimes it seems useful to allow move initialization of a record across locales.

This in particular comes up in distributed sorting. In distributed sorting, moving elements around in bulk is really important for performance. Additionally, since the sort just permutes elements in the array, it can conceptually just "move" the elements around (rather than calling =). But, types like string would probably want to reallocate their buffers on the destination locale by the end of the sort, if not during it.

It could also come up when initializating a distributed array from another distributed array. For example:

var MyBlockArray: [MyBlockDomain] R = returnCyclicArray();

This is related to:

records and copy initialization across locales records and copy initialization across locales #15675
unifying special move initialization functions unifying special move initialization functions #15674
Support copy-elision and split-init across on this blocks Support copy-elision and split-init across on this blocks #15808

I can see several options available to us here:

Distributed sort of non-POD records needs to assign or copy-initialize the elements when moving across locales. Bulk transfer isn't available.
Allow bulk transfer of elements in this use case and fix up after the transfer:

1a. After moving a record across locales, sorting/array implementations should run the copy-initializer and then deinitialize the source record on the original locale. This could allow calling a init= with a from: locale argument as discussed in records and copy initialization across locales #15675.

1b. The record can provide an optional fix-up method to run after a record is moved from another locale. This could accept the source locale as an argument. For example, proc postmove(from: Locale).
The record can provide some sort of serialize/deserialize function to allow bulk transfer. The serialize/deserialize method could include an argument to differentiate move initialization and copy initialization.

The text was updated successfully, but these errors were encountered:

@benharsh

Adjust array copy initialization to avoid default-init/assign For #14924 and #9421. Closes #10374. Closes #11498. Closes #13234. Closes #14925. Previous to this PR, there were several problems with array initialization where the compiler emitted a default-initialization and assignment pattern instead of copy initialization. This behavior is observable for records because the `=` or `init=` for the record could have side effects. Here are some of the problematic cases, assuming A is an array: * untyped copy initialization, e.g. `var B = A` * typed copy initialization, e.g. `var B:A.type = A` * typed or copy initialization for `in` intent arguments * initialization from loop expressions with shape, e.g. `var B = for i in 1..10 do new R(i)` * typed initialization from loop expressions, e.g. `var B:[1..10] int = for i in 1..10 do new R(i)` This PR addresses these problems with a focus on getting the correct behavior for record elements in arrays. Improving array allocation - so in some cases a move can "steal" the buffer from an existing array - is future work (see #9421). So is improving typed copy-initialization for associative domains. At a high level, the approach taken in this PR is to add `chpl__coerceCopy` and `chpl__coerceMove` as siblings for `chpl__initCopy` / `chpl__autoCopy`. Unlike the previously existing copy functions, `chpl__coerceCopy` and `chpl__coerceMove` accept a `type t` argument that stores the runtime type of the destination. These new functions are only appropriate or useful for types with runtime types (namely arrays and domains). The compiler changes are focused on calling `chpl__coerceCopy` / `chpl__coerceMove` in the right places. The `chpl__coerceCopy` can turn into `chpl__coerceMove` in the copy elision pass. The PR makes significant changes to `wrappers.cpp` in order to get`in` intent combined with default arguments to work correctly. Since default args and runtime types used in `in` intent can both use previous argument, this PR changes the approach in `wrappers.cpp` to handle these elements one argument at a time (instead of handling all default args, then all in intents, etc). I think this simplifies the code but the more important thing is that it produces more correct behavior. In order to support `chpl__coerceMove` and `chpl__coerceCopy`, adjusted `_domain.buildArray` to add a `param initElts`. This`param initElts` is threaded through various DSI calls and in particular `dsiBuildArray`. This allows the compiler to know at compile-time whether the array elements need to be default initialized (which will not be allowed for all types). Refactored `init_elts` into `init_elts_method` and added `_ddata_allocate_noinit` to help with this. Similarly, when move initializing one array from another, the elements are transferred to the new array, and so the old array needs to avoid deinitializing the elements. For this reason, added `param deinitElts` to `dsiDestroyArr`. Made dsiDestroyArr a required function since forgetting it would lead to memory leaks which might be less obvious to debug. Created some helper routines (`_deinitElementsIsParallel` / `_deinitElements`) to help with these. Generally speaking this PR adjusts the existing distributions to have the correct record copy behavior. However it does not address making it easy to write such distributions (and in fact currently distributions need to use pragmas to get reasonable behavior). Issue #15673 discusses the future direction to address this problem. Since the elements in an array might now be initialized separately from the allocation, the PR also takes steps to ensure that the post-alloc function needed by the runtime in some cases (e.g. `chpl_mem_array_postAlloc`) is called after the initialization is complete. In order to enable this behavior, it adds `dsiElementInitializationComplete` which domain maps can use to eventually call a function like `chpl_mem_array_postAlloc`. Made dsiElementInitializationComplete a required function since forgetting it would lead to first-touch errors which might be harder to find. Created some alternative ddata functions to allocate without initing and separately run the post-allocate. Added a `callPostAlloc` field to DefaultRectangular to indicate if the post alloc callback was requested by the runtime so it can be run in that event in dsiElementInitializationComplete. This PR needed to make significant changes to DefaultAssociative to get tests working because otherwise arrays had a mix of initialized and uninitialized elements and that was not being tracked appropriately. This PR takes the strategy of adjusting DefaultAssociative to consider `chpl__tableEntry`s that are empty/deleted to have uninitialized `idx`/ values. The process of updating DefaultAssociative led to the discovery and fix of several bugs (for example the bug exercised by this test where deleted slots were hiding existing elements when adding, leading to duplicates -- test/associative/ferguson/check-removing-and-adding.chpl ). One of the bug fixes has the unfortunate effect of un-doing PR #14065 - this is addressed in future work described below involving refactoring DefaultAssociative. While there, reduced the number of DSI methods that associative relies on existing for all arrays and adjusted the ones that do exist to have clearer behavior. Created an `_allSlots` iterator for iterating over pieces of the hashtable since the parallel iteration pattern is independent of the data stored. This helps to do the right first-touch during associative domain allocation. (It can't be done after index initialization since that might happen at an arbitrary future point for some slots). This PR ran into problems with replaceRecordWrappedRefs and the new array initialization pattern. The reference to the element in `chpl__transferArray` that was being initialized (say, `aa`) was becoming a variable instead of a reference, which completely prevented initialization. So, the PR applies a workaround to `chpl__transferArray` to avoid the problem for susceptible types by iterating over the LHS domain instead of array, and then doing array access in the loop. Fixed a bug where copy elision was not removing a call to record postinit. In order to do so, changed initializerRules to add the flag `FLAG_HAS_POSTINIT` to types that have a postinit so that `AggregateType::hasPostInitializer` can be called later in resolution. Improved the wording for error messages to do with setting a non-nilable variable to `nil` to follow the pattern discussed in #11118. Improved isConstInitializable to work better with arrays so that it can be called within array initCopy e.g. Removed `PRIM_DEFAULT_INIT_FIELD` and `_createFieldDefault` (no longer needed) and added `PRIM_STEAL` (to mark a source variable with `FLAG_NO_AUTO_DESTROY`) to help write some of the move initialization patterns. Adjusted formal temps (those that go within function bodies for say `inout` intent) to be destroyed at the end of the function rather than after last use. This better matches the rule for when variables become dead but #15645 discusses more work needed in this area. Removed `chpl_checkCopyInit` because the check is not handled by initializers. Adjusted the unordered forall optimization to work with `PRIM_ASSIGN` (because it comes up now in `chpl__transferArray`) and to get unordered forall compilation report tests working adjusted findLocationIgnoringInternalInlining to consider functions marked `FIND_USER_LINE` similarly to inline functions in internal/standard modules; while there applied this also to functions beginning with `chpl__`. Adjusted Sort.quickSort to call a new insertionSortMoveElts that moves elements (because insertionSort is also called by sorts that assign elements) to avoid problems when sorting arrays. Reviewed by @benharsh - thanks! - [x] primers pass valgrind and don't leak - [x] full local testing - [x] gasnet testing - [x] multilocale perf testing - (EP, Global, Promoted) streams and miniMD are OK Comm count changes: * test/optimizations/remoteValueForwarding/serialization/domains.chpl has 1 additional GET for the domain size due to using it in dsiElementInitializationComplete during an array initialization with a remote domain. DefaultRectangularArr could cache the number of elements allocated in its own structure to remove this GET and many others for cases like this where the domain is remote. * many Cyclic distribution tests in test/distributions/robust/arithmetic/performance/multilocale/ had 1 more GET that seems to have to do with dsiDoms. But dsiDoms is doing GETs many many times for Cyclic already which makes me think that Cyclic needs some optimization (and this problem isn't made significantly worse by this PR). * array-assign-block-get.chpl was showing additional on statements that are probably related to dsiElementInitializationComplete, so I changed it to always use assignment instead of split-init/assign since that is what it has historically done. Issue Cray/chapel-private#964 tracks the TODO of improving dsiElementInitializationComplete. Future work: * Check and tidy up other array initialization patterns - check for default-init/assign patterns in the compiler insertEpilogueTemps maybe insertFinalGenerate insertDestructureStatements fieldToArg and make array literal creation "move" the values into the array literal Cray/chapel-private#955 * improve workaround for replaceRecordWrappedRefs Cray/chapel-private#950 * initialization/deinitialization order of formal temps #15645 * untyped array formal arguments with in intent and defaults #15628 * fix partial reductions sketch tests Cray/chapel-private#945 * #15215 - explore parallel array element deinit (e.g. updating `_deinitElementsIsParallel`) * #6689 - parallel array initialization for all types * Factor a DefaultHashtable out of DefaultAssociative. Add a `value` to `chpl_TableEntry` which defaults to `nothing`. Adjust the Map / Set implementations to use it. Cray/chapel-private#952 * optimize allocations for array move initialization from another array - for same-type arrays it's likely it will be able to steal buffers Cray/chapel-private#953 and #9421 * stop inlining transferArray and use `pragma "find user line"` instead Cray/chapel-private#954 * improve upon the default-init + assign behavior for copying associative domains Cray/chapel-private#962 * make it possible to write domain maps using only user-facing features #15673 * unify `chpl__coerceMove` and `chpl__unref` and `chpl__unalias` #15674 * records and copy initialization across locales #15675 * records and move initialization across locales #15676 * better optimize post-allocation calls for BlockDist Cray/chapel-private#964

mppf · 2020-10-30T19:14:51Z

The record can provide some sort of serialize/deserialize function to allow bulk transfer. The serialize/deserialize method could include an argument to differentiate move initialization and copy initialization.

As I understand it, we are leaning towards this option, where the record can opt in to being notified with serialize/deserialize. The serialize/deserialize can build upon what we already have as a performance optimization.

e-kayrakli · 2021-05-07T17:30:38Z

I am trying to think how a serializers/deserializers might differ for copy vs move.

How do you imagine a move-serializer works for, say, a string? It differs from current serializer in a way that it deallocates the original string's buffer if it is owned, for example? But then, wouldn't it be already handled by virtue of the compiler calling deinit on the original strings somehow ("this array of strings were moved to another locale, so I'll deinit the array and its elements on the original locale")

How do you imagine a move-deserializer works? Is there anything specific that needs to be done on the deserializing end w.r.t move vs copy?

mppf · 2021-05-08T11:11:46Z

@e-kayrakli - when move initialization occurs, say we have var lhs = rhsExpr(), then the compiler doesn't call deinit on the temporary storing rhsExpr at all. Instead it just moves the bits to lhs and leaves deiniting to that variable. We have this working for arrays of strings - if you do var A = somethingReturningArray() then there is only one array created. Some array patterns actually create separate storage for the elements but then use a "move" operation on the individual elements and then don't deinit them in the RHS array.

Between move-serialize and move-deserialize, the key detail is that the composition of these should act as if the RHS is their responsibility to free, if necessary; or to steal the important bits from, if not. E.g. for a string, we allocate a new buffer on the new locale; copy the data over; and then free the original data. But if we didn't want to tie the string buffer to the locale storing the variable, we would just copy the buffer pointer (and the field indicating what locale it is on) and leave it to the resulting LHS value to free these (and in this case we wouldn't want to free them for the RHS since we "stole" them).

I think our move-serialize/deserialize strategy should allow a record author to choose between the two approaches above (at the very least).

mppf added type: Design area: Language labels May 14, 2020

This was referenced May 14, 2020

records and copy initialization across locales #15675

Open

unifying special move initialization functions #15674

Open

Adjust array copy initialization to avoid default-init/assign #15239

Merged

mppf mentioned this issue May 28, 2020

how should initialization of a Replicated array behave? #15731

Open

This was referenced Jun 9, 2020

Refactor set to use chpl__hashtable #15772

Merged

Support copy-elision and split-init across on this blocks #15808

Open

init= and = with different-type rhs #15838

Closed

mppf mentioned this issue Aug 4, 2020

module providing low-level move operations #16172

Closed

e-kayrakli mentioned this issue May 7, 2021

[Proposal] Add a low-level Communication module #17657

Closed

mppf mentioned this issue May 26, 2022

Unexpected behavior of task intents with distributed forall loops #19864

Open

mppf mentioned this issue Nov 20, 2023

proposal to remove indirection in array element access #23890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

records and move initialization across locales #15676

records and move initialization across locales #15676

mppf commented May 14, 2020 •

edited

Loading

mppf commented Oct 30, 2020 •

edited

Loading

e-kayrakli commented May 7, 2021

mppf commented May 8, 2021

records and move initialization across locales #15676

records and move initialization across locales #15676

Comments

mppf commented May 14, 2020 • edited Loading

mppf commented Oct 30, 2020 • edited Loading

e-kayrakli commented May 7, 2021

mppf commented May 8, 2021

mppf commented May 14, 2020 •

edited

Loading

mppf commented Oct 30, 2020 •

edited

Loading