Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust array copy initialization to avoid default-init/assign #15239

Merged
merged 124 commits into from
May 25, 2020

Conversation

mppf
Copy link
Member

@mppf mppf commented Mar 17, 2020

For #14924 and #9421.

Closes #10374.
Closes #11498.
Closes #13234.
Closes #14925.

Previous to this PR, there were several problems with array
initialization where the compiler emitted a default-initialization and
assignment pattern instead of copy initialization. This behavior is
observable for records because the = or init= for the record could
have side effects.

Here are some of the problematic cases, assuming A is an array:

  • untyped copy initialization, e.g. var B = A
  • typed copy initialization, e.g. var B:A.type = A
  • typed or copy initialization for in intent arguments
  • initialization from loop expressions with shape, e.g. var B = for i in 1..10 do new R(i)
  • typed initialization from loop expressions, e.g. var B:[1..10] int = for i in 1..10 do new R(i)

This PR addresses these problems with a focus on getting the correct
behavior for record elements in arrays. Improving array allocation - so
in some cases a move can "steal" the buffer from an existing array - is
future work (see #9421). So is improving typed copy-initialization for
associative domains.

At a high level, the approach taken in this PR is to add
chpl__coerceCopy and chpl__coerceMove as siblings for
chpl__initCopy / chpl__autoCopy. Unlike the previously existing copy
functions, chpl__coerceCopy and chpl__coerceMove accept a type t
argument that stores the runtime type of the destination. These new
functions are only appropriate or useful for types with runtime types
(namely arrays and domains).

The compiler changes are focused on calling chpl__coerceCopy /
chpl__coerceMove in the right places. The chpl__coerceCopy can turn
into chpl__coerceMove in the copy elision pass.

The PR makes significant changes to wrappers.cpp in order to getin
intent combined with default arguments to work correctly. Since default
args and runtime types used in in intent can both use previous
argument, this PR changes the approach in wrappers.cpp to handle these
elements one argument at a time (instead of handling all default args,
then all in intents, etc). I think this simplifies the code but the more
important thing is that it produces more correct behavior.

In order to support chpl__coerceMove and chpl__coerceCopy, adjusted
_domain.buildArray to add a param initElts. Thisparam initElts is
threaded through various DSI calls and in particular dsiBuildArray.
This allows the compiler to know at compile-time whether the array
elements need to be default initialized (which will not be allowed for
all types). Refactored init_elts into init_elts_method and added
_ddata_allocate_noinit to help with this.

Similarly, when move initializing one array from another, the elements
are transferred to the new array, and so the old array needs to avoid
deinitializing the elements. For this reason, added param deinitElts to
dsiDestroyArr. Made dsiDestroyArr a required function since forgetting
it would lead to memory leaks which might be less obvious to debug.
Created some helper routines (_deinitElementsIsParallel /
_deinitElements) to help with these.

Generally speaking this PR adjusts the existing distributions to have the
correct record copy behavior. However it does not address making it easy
to write such distributions (and in fact currently distributions need to
use pragmas to get reasonable behavior). Issue #15673 discusses the
future direction to address this problem.

Since the elements in an array might now be initialized separately from
the allocation, the PR also takes steps to ensure that the post-alloc
function needed by the runtime in some cases (e.g.
chpl_mem_array_postAlloc) is called after the initialization is
complete. In order to enable this behavior, it adds
dsiElementInitializationComplete which domain maps can use to
eventually call a function like chpl_mem_array_postAlloc. Made
dsiElementInitializationComplete a required function since forgetting it
would lead to first-touch errors which might be harder to find. Created
some alternative ddata functions to allocate without initing and
separately run the post-allocate. Added a callPostAlloc field to
DefaultRectangular to indicate if the post alloc callback was requested
by the runtime so it can be run in that event in
dsiElementInitializationComplete.

This PR needed to make significant changes to DefaultAssociative to get
tests working because otherwise arrays had a mix of initialized and
uninitialized elements and that was not being tracked appropriately. This
PR takes the strategy of adjusting DefaultAssociative to consider
chpl__tableEntrys that are empty/deleted to have uninitialized idx/
values. The process of updating DefaultAssociative led to the discovery
and fix of several bugs (for example the bug exercised by this test where
deleted slots were hiding existing elements when adding, leading to
duplicates -- test/associative/ferguson/check-removing-and-adding.chpl ).
One of the bug fixes has the unfortunate effect of un-doing PR #14065 -
this is addressed in future work described below involving refactoring
DefaultAssociative. While there, reduced the number of DSI methods that
associative relies on existing for all arrays and adjusted the ones that
do exist to have clearer behavior. Created an _allSlots iterator for
iterating over pieces of the hashtable since the parallel iteration
pattern is independent of the data stored. This helps to do the right
first-touch during associative domain allocation. (It can't be done after
index initialization since that might happen at an arbitrary future point
for some slots).

This PR ran into problems with replaceRecordWrappedRefs and the new array
initialization pattern. The reference to the element in
chpl__transferArray that was being initialized (say, aa) was becoming
a variable instead of a reference, which completely prevented
initialization. So, the PR applies a workaround to chpl__transferArray to
avoid the problem for susceptible types by iterating over the LHS domain
instead of array, and then doing array access in the loop.

Fixed a bug where copy elision was not removing a call to record
postinit. In order to do so, changed initializerRules to add the flag
FLAG_HAS_POSTINIT to types that have a postinit so that
AggregateType::hasPostInitializer can be called later in resolution.

Improved the wording for error messages to do with setting a non-nilable
variable to nil to follow the pattern discussed in #11118.

Improved isConstInitializable to work better with arrays so that it can
be called within array initCopy e.g.

Removed PRIM_DEFAULT_INIT_FIELD and _createFieldDefault (no longer
needed) and added PRIM_STEAL (to mark a source variable with
FLAG_NO_AUTO_DESTROY) to help write some of the move initialization
patterns.

Adjusted formal temps (those that go within function bodies for say
inout intent) to be destroyed at the end of the function rather than
after last use. This better matches the rule for when variables become
dead but #15645 discusses more work needed in this area.

Removed chpl_checkCopyInit because the check is not handled by
initializers.

Adjusted the unordered forall optimization to work with PRIM_ASSIGN
(because it comes up now in chpl__transferArray) and to get unordered
forall compilation report tests working adjusted
findLocationIgnoringInternalInlining to consider functions marked
FIND_USER_LINE similarly to inline functions in internal/standard
modules; while there applied this also to functions beginning with
chpl__.

Adjusted Sort.quickSort to call a new insertionSortMoveElts that moves
elements (because insertionSort is also called by sorts that assign
elements) to avoid problems when sorting arrays.

Reviewed by @benharsh - thanks!

  • primers pass valgrind and don't leak
  • full local testing
  • gasnet testing
  • multilocale perf testing - (EP, Global, Promoted) streams and
    miniMD are OK

Comm count changes:

  • test/optimizations/remoteValueForwarding/serialization/domains.chpl
    has 1 additional GET for the domain size due to using it in
    dsiElementInitializationComplete during an array initialization with a
    remote domain. DefaultRectangularArr could cache the number of
    elements allocated in its own structure to remove this GET and many
    others for cases like this where the domain is remote.
  • many Cyclic distribution tests in
    test/distributions/robust/arithmetic/performance/multilocale/ had 1
    more GET that seems to have to do with dsiDoms. But dsiDoms is doing
    GETs many many times for Cyclic already which makes me think that
    Cyclic needs some optimization (and this problem isn't made
    significantly worse by this PR).
  • array-assign-block-get.chpl was showing additional on statements that
    are probably related to dsiElementInitializationComplete, so I changed
    it to always use assignment instead of split-init/assign since that is
    what it has historically done. Issue
    https://github.com/Cray/chapel-private/issues/964 tracks the TODO of
    improving dsiElementInitializationComplete.

Future work:

@mppf mppf force-pushed the array-coerce branch 5 times, most recently from 1f24165 to b6c48d6 Compare April 29, 2020 22:32
@e-kayrakli
Copy link
Contributor

Interface changes to dsiBuildArray and dsiDestroyArray seems reasonable to me

mppf added a commit that referenced this pull request May 28, 2020
Fix use of uninitialized memory in ReplicatedDist arrays

See also issue #15731.

After PR #15239, a ReplicatedDist array that is not default-initialized 
would only initialize some of the replicands but then deinitialize all of
them. This PR makes temporary adjustments ReplicatedDist. It adjusts the 
ReplicatedArr code to always initialize the elements. It makes it a 
compilation error to initialize a ReplicatedDist array with non-POD
elements (since without other work this would introduce a memory leak).
It adds additional testing that would have revealed the original problem
(at least when run under valgrind).

Reviewed by @benharsh - thanks!

- [x] full local testing
dlongnecke-cray added a commit that referenced this pull request May 30, 2020
…tures (#15735)

Allow sets of `shared C` and `(int, shared C)` and promote related
futures (#15735)

This PR removes compiler errors for sets of `shared C` and sets of
`(int, shared C)`. Sets of `shared C` were enabled by #15239. Sets
of `(int, shared C)` were enabled by #15733. The futures for both
cases have been promoted to regular tests.

Reviewed by @daviditen.
e-kayrakli added a commit that referenced this pull request Jun 1, 2020
Add perf annotations

This week chapcs performance tests haven't run due to system issues, so we
didn't get a good picture of things. But at the same time, number of PRs that
went in was also limited.

#15239 seems to be the likely candidate for the following performance
changes:

- Stencil performance
- Distributed array-domain init-deinit
- Small array get performance
- Comm count changes

[Reviewed by @ronawho]
mppf added a commit that referenced this pull request Jun 1, 2020
Address performance issues with stencil and block-cyclic array init

Follow-up to PR #15239.
Related to Cray/chapel-private#992

The array initialization improvements (PR #15239) introduced performance
regressions for StencilDist and BlockCyclicDist arrays. Some of these
were due to increased communication. The reason that Stencil and
BlockCyclic arrays are affected is that they both allocate elements that
are not initialized by array initialization (in the case of initializing
from another same-distribution array, say). For Stencil these are
fluff/ghost cell elements. For BlockCyclic they are padding.

See the test performance/array/distCreate.chpl run for example with
    
```
$ ./distCreate --size=arraySize.small --dist=distType.blockCyc -nl 2
```

This PR resolves these problems by:
 * initializing the padding/fluff elements along with the original data
   when default initializing
 * initializing the padding/fluff elements always (because they won't be
   initialized by an initialization expression e.g.) and deinitializing
   them always (because they won't be moved out of when move-initializing
   arrays either)
 * using more efficient means to identify elements that are fluff/padding 

- [x] full local gasnet testing
- [x] full local testing

Reviewed by @e-kayrakli - thanks!
mppf added a commit that referenced this pull request Jun 2, 2020
Factor hashtable used by DefaultAssociative out of it

This PR factors the hashtable implementation out of DefaultAssociative
and puts it into new ChapelHashtable and ChapelHashing modules. These
more clearly separate concerns in the implementation of
DefaultAssociative and enables an optimization removed in PR #15239 to be
added back in (the optimization was from PR #14065). (However this PR
does not yet add back in the optimization - issue
Cray/chapel-private#1000 tracks that TODO).

It migrates `chpl_defaultHash` and associated routines out of
DefaultAssociative and into a new ChapelHashing module.

It adjusts some of the routines in ChapelHashing that move elements to be
factored more reasonably and to use split-init/copy-elision language
features in order to reduce the use of primitives and pragmas.

Additionally it fixes a bug where dsiClear was not deinitializing slots
that it was emptying, leading to a memory leak.

This PR future-izes test/associative/domain/types/testTuple.chpl and
test/library/standard/Set/types/testTuple.chpl due to a compiler bug
exposed by ChapelHashing relying on split-init and copy elision for
moves. Issue #15746 tracks this compiler bug.

- [x] primers pass with valgrind and do not leak 
- [x] full local futures testing

Reviewed by @daviditen - thanks!
mppf added a commit that referenced this pull request Jun 4, 2020
Remove unnecessary array temporaries for arrays-of-arrays

For Cray/chapel-private#1002

PR #15239 inadvertently added array temporaries for arrays-of-arrays in
the `chpl__eltTypeFromArrayRuntimeType` function.

This lead to out-of-memory failures with performance testing of isx on
some systems.

This PR resolves the problem by adjusting
`chpl__eltTypeFromArrayRuntimeType` to use a similar strategy to
`chpl__domainFromArrayRuntimeType` to avoid allocating a temporary array
in case the element type has a runtime type (as with arrays of arrays).
This required extending `PRIM_GET_RUNTIME_TYPE_FIELD` to allow for the
case that the result should be treated as a type. (For history in this
area, see PR #11140).

In investigating this problem, I noticed that
`chpl_incRefCountsForDomainsInArrayEltTypes` /
`chpl_decRefCountsForDomainsInArrayEltTypes` also were allocating
temporary arrays, which is no longer necessary. I adjusted these to call
`chpl__domainFromArrayRuntimeType`. Once I did that,
`chpl_decRefCountsForDomainsInArrayEltTypes` stopped working correctly
and so I went further in adjusting it and related code to better match
array deinitialization. (For history in this area, see also commits
6f435f1 and
65a73e2).

Lastly, this PR adds a test that uses the `--memMax` functionality in
memory tracking to detect future cases where array-of-array
initialization/deinitialization creates sub-array sized temporaries.

Reviewed by @vasslitvinov - thanks!

- [x] isx out-of-memory failures are resolved
- [x] primers pass with valgrind and do not leak
- [x] full local testing
e-kayrakli added a commit that referenced this pull request Jun 6, 2020
Don't GET domain's dimensions unless needed

Follow on to #15239

As an optimization, that PR started storing dimensions of the domain in a local
variable in `DefaultRectangular.isDataContiguous`, but that's not needed unless
the domain is multidimensional.

This is the cause of the comm count increase in

https://chapel-lang.org/perf/chapcs.comm-counts/?startdate=2020/03/02&enddate=2020/06/04&graphs=arrayassignget

This may also impact other perf tests, but I don't expect a huge change in
performance, but maybe only reduced communication in some.

[Reviewed by @mppf]

Test:
- [x] standard
- [x] gasnet
mppf added a commit that referenced this pull request Jul 13, 2020
Improvements to runtime type support to prepare for array noinit

This PR makes improvements to runtime type support to prepare for
implementing `noinit` support for arrays in support of issue #15673.

When implementing `noinit` on an array type, it is important that the
compiler be able to create a different initialization pattern for an
array that is no-inited. However the way that arrays are
default-initialized is tied to the runtime type support in the compiler
in a way that makes that difficult. The following comment describes the
previous situation:

```
  // NOTE: the bodies of functions marked with runtime type init fn such as
  // chpl__buildDomainRuntimeType and chpl__buildArrayRuntimeType are replaced
  // by the compiler to just create a record storing the arguments. The body
  // is moved by the compiler to chpl__convertRuntimeTypeToValue.
  // The return type of chpl__build...RuntimeType is what tells the
  // compiler which runtime type it is creating.
```

To implement noinit support for arrays, we would like to add a `param
initElts` argument to the function that default-initializes an array
(`chpl__convertRuntimeTypeToValue`). However, because
`chpl__buildArrayRuntimeType` both constructs the runtime type (when
called `chpl__buildArrayRuntimeType`) and implements default
initialization (when called `chpl__convertRuntimeTypeToValue`), this was
not possible in a straightforward manner. In particular, the new `param
initElts` argument should not apply to constructing the runtime type;
only to creating a value of that type.

In order to overcome that obstacle, this PR changes the strategy taken by
the compiler to handle these runtime type support functions. Now
`chpl__convertRuntimeTypeToValue` functions are implemented directly in
module code (instead of being constructed by the compiler by cloning
`chpl__buildArrayRuntimeType` etc). This required some adjustment to
several functions handling runtime types in functionResolution.cpp. 

This PR adjusts function resolution to populate the fields of the runtime
type right away once the relevant array type is created by
`chpl__buildArrayRuntimeType` etc. It does this by calling a new
function, `adjustRuntimeTypeInitFn`, once a function marked with `pragma
"runtime type init fn"` is resolved. This new function generates the
runtime type record and adds it to the runtime types map. Additionally,
it populates the runtime type record with type and param fields that were
previously skipped in order to keep these details preserved. (They will
be removed in late resolution just like with any other record). As a
result, the primitive `PRIM_GET_RUNTIME_TYPE_FIELD` no longer needs an
argument indicating the field type. The argument indicating the field
type was problematic because it was most naturally computed by default
initialization (in several functions added in #11140 and #15239) but that
could have performance impact (see #15767). Now that
`PRIM_GET_RUNTIME_TYPE_FIELD` no longer needs the field type (since the
compiler knows it during resolution), there is no need for the
problematic code.

Since the runtime type record is created earlier, in
`adjustRuntimeTypeInitFn`, now `buildRuntimeTypeInitFns` just adjusts the
runtime type init functions to return the runtime type. It no longer
generates `chpl__convertRuntimeTypeToValue` by cloning e.g.
`chpl__buildArrayRuntimeType` as `chpl__convertRuntimeTypeToValue` is now
written in module code.

In order to get arrays-of-arrays to work correctly with these changes, it
was also necessary to change how default initialization for variables
with runtime types was handled. Now, default initialization for these
types is handled in `lowerPrimInit` which calls `lowerRuntimeTypeInit`.
At that time, the runtime type records exist but are not yet used. So,
the lowering makes use of `PRIM_GET_RUNTIME_TYPE_FIELD` to gather the
arguments to pass to `chpl__convertRuntimeTypeToValue`. (That, in turn,
relies upon the changes to the creation of runtime type records described
above; including in particular having the runtime type record exist and
have `param` fields at this point).

`insertRuntimeTypeTemps` is renamed to `populateRuntimeTypeMap` to better
reflect what it does. In addition, it now includes a check to skip work
in the event the relevant map entry is already present.

`replaceValuesWithRuntimeTypes` is renamed to
`replaceTypeFormalsWithRuntimeTypes` to better reflect what it does. So
too is `replaceReturnedValuesWithRuntimeTypes` renamed to
`replaceReturnedTypesWithRuntimeTypes`

`replaceRuntimeTypePrims`, now limited to just handling
`PRIM_GET_RUNTIME_TYPE_FIELD`, is moved to `removeRandomPrimitive` in
order to prevent an additional traversal over all calls in the AST.

To reduce the potential for confusion, this PR also renames the fields in
the runtime type records. For the runtime type record for a domain, this
PR renames the distribution field from`d` to `dist`. For the runtime type
record for a sparse domain, this PR renames the parent domain field from
`dom` to `parentDom`.

In order to resolve a problem encountered during development, this PR
adds `chpl_createBytesWithLiteral`, makes it a well-known function, and
calls it in `new_BytesSymbol`.

Here is a summary of the way runtime types are handled in the compiler
after this PR:
* `adjustRuntimeTypeInitFn` is called in `resolveFunction` for `pragma
  "runtime type init fn"` functions such as
  `chpl__buildArrayRuntimeType`. It generates the runtime type record,
  complete with param and type fields, and stores a link to that record
  in `runtimeTypeMap`.
* `lowerPrimInit` calls `lowerRuntimeTypeInit` to change default
  initialization of variables with runtime type into calls to
  `chpl__convertRuntimeTypeToValue`
* `populateRuntimeTypeMap` populates a map from types with runtime type
  to `chpl__convertValueToRuntimeType` functions. These functions are
  implemented in ChapelArray.
* `handleRuntimeTypes` runs late in resolution and takes these steps:
   * run `populateRuntimeTypeMap` a second time
   * `buildRuntimeTypeInitFns` converts `chpl__buildArrayRuntimeType`
     etc. into functions returning a runtime type record
   * `replaceTypeFormalsWithRuntimeTypes` changes `type` formals with
     runtime type into runtime type records
   * `replaceReturnedValuesWithRuntimeTypes` changes returned `type`
     values into runtime type records
   * `replaceRuntimeTypeVariableTypes` changes `type` variables into
     runtime type records
* `removeRandomPrimitive` (called in `pruneResolvedTree`) lowers
  `PRIM_GET_RUNTIME_TYPE_FIELD`.
 
Reviewed by @vasslitvinov - thanks!

- [x] primers pass with valgrind/verify and do not leak
- [x] full local testing
mppf added a commit to mppf/chapel that referenced this pull request Jul 21, 2020
mppf added a commit to mppf/chapel that referenced this pull request Jul 22, 2020
mppf added a commit that referenced this pull request Jul 22, 2020
Remove pragma "local field" from PrivateArr

It was accidentally added in PR #15239.

Resolves Cray/chapel-private#1175

Trivial and not reviewed.

- [x] full local testing
gbtitus added a commit to gbtitus/chapel that referenced this pull request Jul 23, 2020
Following the lead of chapel-lang#15239 (01fa4e7), update comm counts for comm
layers that support network atomics (ofi, ugni).
gbtitus added a commit that referenced this pull request Jul 23, 2020
Update networkAtomics!=none comm counts for sliceBlockWithRanges.

(Not reviewed.)

Following the lead of #15239 (01fa4e7), update comm counts for comm
layers that support network atomics (ofi, ugni).
mppf added a commit to mppf/chapel that referenced this pull request Feb 26, 2021
It was accidentally removed in PR chapel-lang#15239.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
mppf added a commit to mppf/chapel that referenced this pull request Sep 22, 2021
It was accidentally removed in PR chapel-lang#15239.

---
Signed-off-by: Michael Ferguson <mppf@users.noreply.github.com>
mppf added a commit that referenced this pull request Oct 4, 2021
Put the minlen condition back in quicksort

It was accidentally removed in PR #15239. Putting it back caused problems
with some `mason search` testing, but that was because the code sorting
the results for `mason search` assumed that the sort was stable. It is
not, so this PR updates that `mason search` implementation to work with
unstable sort and the sort documentation to make it clear that it is not
a stable sort.

Reviewed by @arezaii - thanks!

- [x] full local testing
- [x] check sort test directory with valgrind and memleaks
aconsroe-hpe added a commit that referenced this pull request Oct 18, 2021
Replace uses of "chpl__autoDestroy" with astr_autoDestroy

reviewed by @mppf 

This finishes off a remaining case from PR #15239 in an attempt to
close issue #15011. I don't think this is particularly important, but
the string literal being passed to isNamedAstr was suspect and this
brings it in line with other uses.

Paratest pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants