-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust array copy initialization to avoid default-init/assign #15239
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1f24165
to
b6c48d6
Compare
Interface changes to |
This was referenced May 4, 2020
2 tasks
This was referenced May 14, 2020
primers pass with valgrind+verify and do not leak
implementations still need adjustment add pragam unsafe as well now hellos pass
TODO: fix array-initialization-patterns good / fix bugs
Removes PRIM_DEFAULT_INIT_FIELD.
Merged
mppf
added a commit
that referenced
this pull request
May 28, 2020
Fix use of uninitialized memory in ReplicatedDist arrays See also issue #15731. After PR #15239, a ReplicatedDist array that is not default-initialized would only initialize some of the replicands but then deinitialize all of them. This PR makes temporary adjustments ReplicatedDist. It adjusts the ReplicatedArr code to always initialize the elements. It makes it a compilation error to initialize a ReplicatedDist array with non-POD elements (since without other work this would introduce a memory leak). It adds additional testing that would have revealed the original problem (at least when run under valgrind). Reviewed by @benharsh - thanks! - [x] full local testing
2 tasks
dlongnecke-cray
added a commit
that referenced
this pull request
May 30, 2020
…tures (#15735) Allow sets of `shared C` and `(int, shared C)` and promote related futures (#15735) This PR removes compiler errors for sets of `shared C` and sets of `(int, shared C)`. Sets of `shared C` were enabled by #15239. Sets of `(int, shared C)` were enabled by #15733. The futures for both cases have been promoted to regular tests. Reviewed by @daviditen.
2 tasks
e-kayrakli
added a commit
that referenced
this pull request
Jun 1, 2020
Add perf annotations This week chapcs performance tests haven't run due to system issues, so we didn't get a good picture of things. But at the same time, number of PRs that went in was also limited. #15239 seems to be the likely candidate for the following performance changes: - Stencil performance - Distributed array-domain init-deinit - Small array get performance - Comm count changes [Reviewed by @ronawho]
mppf
added a commit
that referenced
this pull request
Jun 1, 2020
Address performance issues with stencil and block-cyclic array init Follow-up to PR #15239. Related to Cray/chapel-private#992 The array initialization improvements (PR #15239) introduced performance regressions for StencilDist and BlockCyclicDist arrays. Some of these were due to increased communication. The reason that Stencil and BlockCyclic arrays are affected is that they both allocate elements that are not initialized by array initialization (in the case of initializing from another same-distribution array, say). For Stencil these are fluff/ghost cell elements. For BlockCyclic they are padding. See the test performance/array/distCreate.chpl run for example with ``` $ ./distCreate --size=arraySize.small --dist=distType.blockCyc -nl 2 ``` This PR resolves these problems by: * initializing the padding/fluff elements along with the original data when default initializing * initializing the padding/fluff elements always (because they won't be initialized by an initialization expression e.g.) and deinitializing them always (because they won't be moved out of when move-initializing arrays either) * using more efficient means to identify elements that are fluff/padding - [x] full local gasnet testing - [x] full local testing Reviewed by @e-kayrakli - thanks!
mppf
added a commit
that referenced
this pull request
Jun 2, 2020
Factor hashtable used by DefaultAssociative out of it This PR factors the hashtable implementation out of DefaultAssociative and puts it into new ChapelHashtable and ChapelHashing modules. These more clearly separate concerns in the implementation of DefaultAssociative and enables an optimization removed in PR #15239 to be added back in (the optimization was from PR #14065). (However this PR does not yet add back in the optimization - issue Cray/chapel-private#1000 tracks that TODO). It migrates `chpl_defaultHash` and associated routines out of DefaultAssociative and into a new ChapelHashing module. It adjusts some of the routines in ChapelHashing that move elements to be factored more reasonably and to use split-init/copy-elision language features in order to reduce the use of primitives and pragmas. Additionally it fixes a bug where dsiClear was not deinitializing slots that it was emptying, leading to a memory leak. This PR future-izes test/associative/domain/types/testTuple.chpl and test/library/standard/Set/types/testTuple.chpl due to a compiler bug exposed by ChapelHashing relying on split-init and copy elision for moves. Issue #15746 tracks this compiler bug. - [x] primers pass with valgrind and do not leak - [x] full local futures testing Reviewed by @daviditen - thanks!
3 tasks
mppf
added a commit
that referenced
this pull request
Jun 4, 2020
Remove unnecessary array temporaries for arrays-of-arrays For Cray/chapel-private#1002 PR #15239 inadvertently added array temporaries for arrays-of-arrays in the `chpl__eltTypeFromArrayRuntimeType` function. This lead to out-of-memory failures with performance testing of isx on some systems. This PR resolves the problem by adjusting `chpl__eltTypeFromArrayRuntimeType` to use a similar strategy to `chpl__domainFromArrayRuntimeType` to avoid allocating a temporary array in case the element type has a runtime type (as with arrays of arrays). This required extending `PRIM_GET_RUNTIME_TYPE_FIELD` to allow for the case that the result should be treated as a type. (For history in this area, see PR #11140). In investigating this problem, I noticed that `chpl_incRefCountsForDomainsInArrayEltTypes` / `chpl_decRefCountsForDomainsInArrayEltTypes` also were allocating temporary arrays, which is no longer necessary. I adjusted these to call `chpl__domainFromArrayRuntimeType`. Once I did that, `chpl_decRefCountsForDomainsInArrayEltTypes` stopped working correctly and so I went further in adjusting it and related code to better match array deinitialization. (For history in this area, see also commits 6f435f1 and 65a73e2). Lastly, this PR adds a test that uses the `--memMax` functionality in memory tracking to detect future cases where array-of-array initialization/deinitialization creates sub-array sized temporaries. Reviewed by @vasslitvinov - thanks! - [x] isx out-of-memory failures are resolved - [x] primers pass with valgrind and do not leak - [x] full local testing
This was referenced Jun 4, 2020
e-kayrakli
added a commit
that referenced
this pull request
Jun 6, 2020
Don't GET domain's dimensions unless needed Follow on to #15239 As an optimization, that PR started storing dimensions of the domain in a local variable in `DefaultRectangular.isDataContiguous`, but that's not needed unless the domain is multidimensional. This is the cause of the comm count increase in https://chapel-lang.org/perf/chapcs.comm-counts/?startdate=2020/03/02&enddate=2020/06/04&graphs=arrayassignget This may also impact other perf tests, but I don't expect a huge change in performance, but maybe only reduced communication in some. [Reviewed by @mppf] Test: - [x] standard - [x] gasnet
2 tasks
mppf
added a commit
that referenced
this pull request
Jul 13, 2020
Improvements to runtime type support to prepare for array noinit This PR makes improvements to runtime type support to prepare for implementing `noinit` support for arrays in support of issue #15673. When implementing `noinit` on an array type, it is important that the compiler be able to create a different initialization pattern for an array that is no-inited. However the way that arrays are default-initialized is tied to the runtime type support in the compiler in a way that makes that difficult. The following comment describes the previous situation: ``` // NOTE: the bodies of functions marked with runtime type init fn such as // chpl__buildDomainRuntimeType and chpl__buildArrayRuntimeType are replaced // by the compiler to just create a record storing the arguments. The body // is moved by the compiler to chpl__convertRuntimeTypeToValue. // The return type of chpl__build...RuntimeType is what tells the // compiler which runtime type it is creating. ``` To implement noinit support for arrays, we would like to add a `param initElts` argument to the function that default-initializes an array (`chpl__convertRuntimeTypeToValue`). However, because `chpl__buildArrayRuntimeType` both constructs the runtime type (when called `chpl__buildArrayRuntimeType`) and implements default initialization (when called `chpl__convertRuntimeTypeToValue`), this was not possible in a straightforward manner. In particular, the new `param initElts` argument should not apply to constructing the runtime type; only to creating a value of that type. In order to overcome that obstacle, this PR changes the strategy taken by the compiler to handle these runtime type support functions. Now `chpl__convertRuntimeTypeToValue` functions are implemented directly in module code (instead of being constructed by the compiler by cloning `chpl__buildArrayRuntimeType` etc). This required some adjustment to several functions handling runtime types in functionResolution.cpp. This PR adjusts function resolution to populate the fields of the runtime type right away once the relevant array type is created by `chpl__buildArrayRuntimeType` etc. It does this by calling a new function, `adjustRuntimeTypeInitFn`, once a function marked with `pragma "runtime type init fn"` is resolved. This new function generates the runtime type record and adds it to the runtime types map. Additionally, it populates the runtime type record with type and param fields that were previously skipped in order to keep these details preserved. (They will be removed in late resolution just like with any other record). As a result, the primitive `PRIM_GET_RUNTIME_TYPE_FIELD` no longer needs an argument indicating the field type. The argument indicating the field type was problematic because it was most naturally computed by default initialization (in several functions added in #11140 and #15239) but that could have performance impact (see #15767). Now that `PRIM_GET_RUNTIME_TYPE_FIELD` no longer needs the field type (since the compiler knows it during resolution), there is no need for the problematic code. Since the runtime type record is created earlier, in `adjustRuntimeTypeInitFn`, now `buildRuntimeTypeInitFns` just adjusts the runtime type init functions to return the runtime type. It no longer generates `chpl__convertRuntimeTypeToValue` by cloning e.g. `chpl__buildArrayRuntimeType` as `chpl__convertRuntimeTypeToValue` is now written in module code. In order to get arrays-of-arrays to work correctly with these changes, it was also necessary to change how default initialization for variables with runtime types was handled. Now, default initialization for these types is handled in `lowerPrimInit` which calls `lowerRuntimeTypeInit`. At that time, the runtime type records exist but are not yet used. So, the lowering makes use of `PRIM_GET_RUNTIME_TYPE_FIELD` to gather the arguments to pass to `chpl__convertRuntimeTypeToValue`. (That, in turn, relies upon the changes to the creation of runtime type records described above; including in particular having the runtime type record exist and have `param` fields at this point). `insertRuntimeTypeTemps` is renamed to `populateRuntimeTypeMap` to better reflect what it does. In addition, it now includes a check to skip work in the event the relevant map entry is already present. `replaceValuesWithRuntimeTypes` is renamed to `replaceTypeFormalsWithRuntimeTypes` to better reflect what it does. So too is `replaceReturnedValuesWithRuntimeTypes` renamed to `replaceReturnedTypesWithRuntimeTypes` `replaceRuntimeTypePrims`, now limited to just handling `PRIM_GET_RUNTIME_TYPE_FIELD`, is moved to `removeRandomPrimitive` in order to prevent an additional traversal over all calls in the AST. To reduce the potential for confusion, this PR also renames the fields in the runtime type records. For the runtime type record for a domain, this PR renames the distribution field from`d` to `dist`. For the runtime type record for a sparse domain, this PR renames the parent domain field from `dom` to `parentDom`. In order to resolve a problem encountered during development, this PR adds `chpl_createBytesWithLiteral`, makes it a well-known function, and calls it in `new_BytesSymbol`. Here is a summary of the way runtime types are handled in the compiler after this PR: * `adjustRuntimeTypeInitFn` is called in `resolveFunction` for `pragma "runtime type init fn"` functions such as `chpl__buildArrayRuntimeType`. It generates the runtime type record, complete with param and type fields, and stores a link to that record in `runtimeTypeMap`. * `lowerPrimInit` calls `lowerRuntimeTypeInit` to change default initialization of variables with runtime type into calls to `chpl__convertRuntimeTypeToValue` * `populateRuntimeTypeMap` populates a map from types with runtime type to `chpl__convertValueToRuntimeType` functions. These functions are implemented in ChapelArray. * `handleRuntimeTypes` runs late in resolution and takes these steps: * run `populateRuntimeTypeMap` a second time * `buildRuntimeTypeInitFns` converts `chpl__buildArrayRuntimeType` etc. into functions returning a runtime type record * `replaceTypeFormalsWithRuntimeTypes` changes `type` formals with runtime type into runtime type records * `replaceReturnedValuesWithRuntimeTypes` changes returned `type` values into runtime type records * `replaceRuntimeTypeVariableTypes` changes `type` variables into runtime type records * `removeRandomPrimitive` (called in `pruneResolvedTree`) lowers `PRIM_GET_RUNTIME_TYPE_FIELD`. Reviewed by @vasslitvinov - thanks! - [x] primers pass with valgrind/verify and do not leak - [x] full local testing
mppf
added a commit
to mppf/chapel
that referenced
this pull request
Jul 21, 2020
It was accidentally added in PR chapel-lang#15239.
1 task
mppf
added a commit
to mppf/chapel
that referenced
this pull request
Jul 22, 2020
It was accidentally added in PR chapel-lang#15239.
mppf
added a commit
that referenced
this pull request
Jul 22, 2020
Remove pragma "local field" from PrivateArr It was accidentally added in PR #15239. Resolves Cray/chapel-private#1175 Trivial and not reviewed. - [x] full local testing
gbtitus
added a commit
to gbtitus/chapel
that referenced
this pull request
Jul 23, 2020
Following the lead of chapel-lang#15239 (01fa4e7), update comm counts for comm layers that support network atomics (ofi, ugni).
mppf
added a commit
to mppf/chapel
that referenced
this pull request
Feb 26, 2021
It was accidentally removed in PR chapel-lang#15239. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
2 tasks
mppf
added a commit
to mppf/chapel
that referenced
this pull request
Sep 22, 2021
It was accidentally removed in PR chapel-lang#15239. --- Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
mppf
added a commit
that referenced
this pull request
Oct 4, 2021
Put the minlen condition back in quicksort It was accidentally removed in PR #15239. Putting it back caused problems with some `mason search` testing, but that was because the code sorting the results for `mason search` assumed that the sort was stable. It is not, so this PR updates that `mason search` implementation to work with unstable sort and the sort documentation to make it clear that it is not a stable sort. Reviewed by @arezaii - thanks! - [x] full local testing - [x] check sort test directory with valgrind and memleaks
aconsroe-hpe
added a commit
that referenced
this pull request
Oct 18, 2021
Replace uses of "chpl__autoDestroy" with astr_autoDestroy reviewed by @mppf This finishes off a remaining case from PR #15239 in an attempt to close issue #15011. I don't think this is particularly important, but the string literal being passed to isNamedAstr was suspect and this brings it in line with other uses. Paratest pass
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For #14924 and #9421.
Closes #10374.
Closes #11498.
Closes #13234.
Closes #14925.
Previous to this PR, there were several problems with array
initialization where the compiler emitted a default-initialization and
assignment pattern instead of copy initialization. This behavior is
observable for records because the
=
orinit=
for the record couldhave side effects.
Here are some of the problematic cases, assuming A is an array:
var B = A
var B:A.type = A
in
intent argumentsvar B = for i in 1..10 do new R(i)
var B:[1..10] int = for i in 1..10 do new R(i)
This PR addresses these problems with a focus on getting the correct
behavior for record elements in arrays. Improving array allocation - so
in some cases a move can "steal" the buffer from an existing array - is
future work (see #9421). So is improving typed copy-initialization for
associative domains.
At a high level, the approach taken in this PR is to add
chpl__coerceCopy
andchpl__coerceMove
as siblings forchpl__initCopy
/chpl__autoCopy
. Unlike the previously existing copyfunctions,
chpl__coerceCopy
andchpl__coerceMove
accept atype t
argument that stores the runtime type of the destination. These new
functions are only appropriate or useful for types with runtime types
(namely arrays and domains).
The compiler changes are focused on calling
chpl__coerceCopy
/chpl__coerceMove
in the right places. Thechpl__coerceCopy
can turninto
chpl__coerceMove
in the copy elision pass.The PR makes significant changes to
wrappers.cpp
in order to getin
intent combined with default arguments to work correctly. Since default
args and runtime types used in
in
intent can both use previousargument, this PR changes the approach in
wrappers.cpp
to handle theseelements one argument at a time (instead of handling all default args,
then all in intents, etc). I think this simplifies the code but the more
important thing is that it produces more correct behavior.
In order to support
chpl__coerceMove
andchpl__coerceCopy
, adjusted_domain.buildArray
to add aparam initElts
. Thisparam initElts
isthreaded through various DSI calls and in particular
dsiBuildArray
.This allows the compiler to know at compile-time whether the array
elements need to be default initialized (which will not be allowed for
all types). Refactored
init_elts
intoinit_elts_method
and added_ddata_allocate_noinit
to help with this.Similarly, when move initializing one array from another, the elements
are transferred to the new array, and so the old array needs to avoid
deinitializing the elements. For this reason, added
param deinitElts
todsiDestroyArr
. Made dsiDestroyArr a required function since forgettingit would lead to memory leaks which might be less obvious to debug.
Created some helper routines (
_deinitElementsIsParallel
/_deinitElements
) to help with these.Generally speaking this PR adjusts the existing distributions to have the
correct record copy behavior. However it does not address making it easy
to write such distributions (and in fact currently distributions need to
use pragmas to get reasonable behavior). Issue #15673 discusses the
future direction to address this problem.
Since the elements in an array might now be initialized separately from
the allocation, the PR also takes steps to ensure that the post-alloc
function needed by the runtime in some cases (e.g.
chpl_mem_array_postAlloc
) is called after the initialization iscomplete. In order to enable this behavior, it adds
dsiElementInitializationComplete
which domain maps can use toeventually call a function like
chpl_mem_array_postAlloc
. MadedsiElementInitializationComplete a required function since forgetting it
would lead to first-touch errors which might be harder to find. Created
some alternative ddata functions to allocate without initing and
separately run the post-allocate. Added a
callPostAlloc
field toDefaultRectangular to indicate if the post alloc callback was requested
by the runtime so it can be run in that event in
dsiElementInitializationComplete.
This PR needed to make significant changes to DefaultAssociative to get
tests working because otherwise arrays had a mix of initialized and
uninitialized elements and that was not being tracked appropriately. This
PR takes the strategy of adjusting DefaultAssociative to consider
chpl__tableEntry
s that are empty/deleted to have uninitializedidx
/values. The process of updating DefaultAssociative led to the discovery
and fix of several bugs (for example the bug exercised by this test where
deleted slots were hiding existing elements when adding, leading to
duplicates -- test/associative/ferguson/check-removing-and-adding.chpl ).
One of the bug fixes has the unfortunate effect of un-doing PR #14065 -
this is addressed in future work described below involving refactoring
DefaultAssociative. While there, reduced the number of DSI methods that
associative relies on existing for all arrays and adjusted the ones that
do exist to have clearer behavior. Created an
_allSlots
iterator foriterating over pieces of the hashtable since the parallel iteration
pattern is independent of the data stored. This helps to do the right
first-touch during associative domain allocation. (It can't be done after
index initialization since that might happen at an arbitrary future point
for some slots).
This PR ran into problems with replaceRecordWrappedRefs and the new array
initialization pattern. The reference to the element in
chpl__transferArray
that was being initialized (say,aa
) was becominga variable instead of a reference, which completely prevented
initialization. So, the PR applies a workaround to
chpl__transferArray
toavoid the problem for susceptible types by iterating over the LHS domain
instead of array, and then doing array access in the loop.
Fixed a bug where copy elision was not removing a call to record
postinit. In order to do so, changed initializerRules to add the flag
FLAG_HAS_POSTINIT
to types that have a postinit so thatAggregateType::hasPostInitializer
can be called later in resolution.Improved the wording for error messages to do with setting a non-nilable
variable to
nil
to follow the pattern discussed in #11118.Improved isConstInitializable to work better with arrays so that it can
be called within array initCopy e.g.
Removed
PRIM_DEFAULT_INIT_FIELD
and_createFieldDefault
(no longerneeded) and added
PRIM_STEAL
(to mark a source variable withFLAG_NO_AUTO_DESTROY
) to help write some of the move initializationpatterns.
Adjusted formal temps (those that go within function bodies for say
inout
intent) to be destroyed at the end of the function rather thanafter last use. This better matches the rule for when variables become
dead but #15645 discusses more work needed in this area.
Removed
chpl_checkCopyInit
because the check is not handled byinitializers.
Adjusted the unordered forall optimization to work with
PRIM_ASSIGN
(because it comes up now in
chpl__transferArray
) and to get unorderedforall compilation report tests working adjusted
findLocationIgnoringInternalInlining to consider functions marked
FIND_USER_LINE
similarly to inline functions in internal/standardmodules; while there applied this also to functions beginning with
chpl__
.Adjusted Sort.quickSort to call a new insertionSortMoveElts that moves
elements (because insertionSort is also called by sorts that assign
elements) to avoid problems when sorting arrays.
Reviewed by @benharsh - thanks!
miniMD are OK
Comm count changes:
has 1 additional GET for the domain size due to using it in
dsiElementInitializationComplete during an array initialization with a
remote domain. DefaultRectangularArr could cache the number of
elements allocated in its own structure to remove this GET and many
others for cases like this where the domain is remote.
test/distributions/robust/arithmetic/performance/multilocale/ had 1
more GET that seems to have to do with dsiDoms. But dsiDoms is doing
GETs many many times for Cyclic already which makes me think that
Cyclic needs some optimization (and this problem isn't made
significantly worse by this PR).
are probably related to dsiElementInitializationComplete, so I changed
it to always use assignment instead of split-init/assign since that is
what it has historically done. Issue
https://github.com/Cray/chapel-private/issues/964 tracks the TODO of
improving dsiElementInitializationComplete.
Future work:
default-init/assign patterns in the compiler insertEpilogueTemps maybe
insertFinalGenerate insertDestructureStatements fieldToArg and make
array literal creation "move" the values into the array literal
https://github.com/Cray/chapel-private/issues/955
https://github.com/Cray/chapel-private/issues/950
https://github.com/Cray/chapel-private/issues/945
_deinitElementsIsParallel
)value
tochpl_TableEntry
which defaults tonothing
. Adjust the Map / Setimplementations to use it.
https://github.com/Cray/chapel-private/issues/952
https://github.com/Cray/chapel-private/issues/953 and How can we optimize initialization of arrays with explicit runtime type from expiring values? #9421
pragma "find user line"
insteadhttps://github.com/Cray/chapel-private/issues/954
associative domains https://github.com/Cray/chapel-private/issues/962
low-level array support for domain map implementation #15673
chpl__coerceMove
andchpl__unref
andchpl__unalias
unifying special move initialization functions #15674https://github.com/Cray/chapel-private/issues/964