permutation method in Random module overwrites the input array with permuted indices of the domain #16400

CaptainSharf · 2020-09-16T08:38:15Z

Summary of Problem

Invoking the permutation method on an array overwrites the array with permuted indices of its domain. Shouldn't the method return a new permuted version of the input array? Would it be a good idea to modify the current permutation API to take an extra parameter(like inPlace=true) to determine if the input array is to be permuted inplace (like shuffle)?

Steps to Reproduce

Compile and run the below source code
Source Code:

use Random;
var randStream = new RandomStream(real);
var randomArr : [1..5] real;
randStream.fillRandom(randomArr);
writeln("Array before permuting is ",randomArr);
permutation(randomArr,seed=10);
writeln("Array after permuting is ",randomArr);

Compile command:
chpl permutations.chpl --fast

Execution command:
./permutations.chpl

Output
Array before permuting is 0.16559 0.754886 0.615159 0.171141 0.539561
Array after permuting is 1.0 5.0 4.0 3.0 2.0

Configuration Information

chpl --version: chpl version 1.22.1
gcc --version: Apple LLVM version 10.0.1 (clang-1001.0.46.4)

The text was updated successfully, but these errors were encountered:

bradcray · 2020-09-16T20:01:54Z

I would tend to agree that, to me, permutation(A); intuitively seems like it would permute the elements of A rather than overwriting the existing elements with a permutation of A's indices. I might expect permutation(Dom); to return an array storing a permutation of D's indices or permutation(Dom, Arr); to store a permutation of Dom into Arr. But I also don't recall whether there was any rationale or precedent here that I'm currently forgetting. I believe these were added in #2912 by @mppf, so am curious what Michael has to say.

mppf · 2020-09-16T20:10:56Z

We already have shuffle(A) to rearrange the elements of A. https://chapel-lang.org/docs/modules/standard/Random.html#Random.shuffle

I think it would be fine to change permutation to permutation(Dom, Arr). One thing to keep in mind is that the domain being permuted isn't necessarily the same as the destination domain for the array. E.g. you might want a permutation of 100..<200 stored in an array that has indices 0..100.

CaptainSharf · 2020-09-18T16:58:02Z

Hi, @mppf , I have raisedthis PR #16417 for this issue, can you please review it when you're free?

CaptainSharf · 2020-09-20T07:05:30Z

Hi @bradcray,
Thanks for your feedback for my PR #16417. Here are my thoughts regarding the questions you raised

should we preserve the original implementation of permutation() and add this new version as an overload to avoid breaking existing code?
This is a good idea. I think that newer implementations of this method should be overloaded
what are the advantages of the new version taking in an array as input rather than returning a newly-created array declared over the domain argument? Or should we support both versions as overloads?
I favour the version of the method that takes a domain as input and returns a random permutation over the indices of the domain as an array. This version is consistent with the behaviour of the permutations method in other frameworks like numpy. Based on performance and other considerations do you think one version is better than the other?
in the version that takes a domain and an array, should the two be required to have the same size? Or perhaps the array should be strictly smaller than the domain?
In my current implementation, I require the size of the array to be at least as big as the size of the domain. The extra elements in the array are set to zero.

CaptainSharf · 2020-09-20T13:15:51Z

Here, are some of the methods for generating random permutations in a few other frameworks that I could find:

Numpy

np.random.permutation(arr) returns a random permutation of the given array without modifying arr. Support for numpy arrays with rank>1.
np.random.permutation(n) returns an array of a random permutation of numbers in [1,n]

Matlab

p = randperm(n) returns an array containing a random permutation of the integers from 1 to n without repeating elements.
p = randperm(n,k) returns an array containing k unique integers selected randomly from 1 to n.
p = randperm(s,___) generates a random permutation of integers from specified random number stream s.

Go
In rand.Perm module func Perm(n int) []int returns an array of random permutation of all numbers in [0,n)

ben-albrecht · 2020-09-21T14:36:40Z

@CaptainSharf - Thanks for bringing this up and providing a review of some other languages.

I think just changing the behavior of our permutation to always return a new array makes the most sense. permutation could take an array to do an out-of-place shuffle of the elements, or take a domain to generate a permutation of the domain indices. I don't think in-place index shuffling (the current permutation behavior) is that important to continue supporting.

FWIW, this would follow the convention from numpy, where permutation is the out-of-place shuffle, and shuffle is the in-place shuffle.

ben-albrecht · 2020-09-21T14:43:40Z

If we were to pursue an inPlace argument that was suggested in the OP, we could potentially merge both shuffle and permutation into a single function, e.g.

/* Return an array containing the shuffled elements of ``arr``. \
   If ``inPlace == true``, shuffle the elements in place and return nothing. */
proc shuffle(arr: [], param inPlace=false, ...) { }

/* Return an array containing the shuffled indices of ``dom``. */
proc shuffle(dom: domain, ...) { }

@mppf has already expressed opposition to this proposal offline on the basis of not liking arguments impacting the function signature. It's probably not worth discussing inPlace arguments further unless it resonates with others.

CaptainSharf · 2020-09-21T15:11:21Z

@ben-albrecht , Thanks for the feedback!!
Just for the sake of summarization, we think it would be a good idea to replace the existing implementation of permutations with two overloaded methods

proc permutation(arr:[]) that returns a new permuted version of the input array
proc permutation(dom:domain) that returns a new array with permuted indices of the domain?
Am I correct in assuming this?

ben-albrecht · 2020-09-21T15:20:42Z

@CaptainSharf - Yes, that is correct.

Another design element: How can we properly deprecate the existing permutation behavior? This is challenging because we are only changing the return type (from nothing to array), so it will be hard to warn only users using it incorrectly.

Whatever we decide on, we ought to update the deprecation guidelines to cover this special case for future deprecations.

CaptainSharf · 2020-09-21T16:32:16Z

@ben-albrecht , yes it seems to me we won't be able to deprecate the existing behaviour easily when an array is passed. There would not be an issue in introducing an overloaded version of permutations which takes domain as argument. Since shuffle randomly permutes the array, one can still accomplish the intended behaviour we're looking for permute by saving the state of the array before shuffling and restoring it later.

mppf · 2020-09-21T17:10:54Z

Right, for the deprecation in this case, we can just provide proc permutation(dom:domain) for a release and add proc permutation(arr:[]) in a later release.

bradcray · 2020-09-21T18:44:31Z

To deal with the deprecation issue, I'd be inclined to name this routine permute() rather than permutation() just because permute({1..n}) or permute(A) reads much more cleanly than permutation({1..n}) or permutation(A) to me (i.e., they seem "more like an English sentence"). Given that NumPy, Matlab, and Go don't share a name for this capability, it doesn't feel like we'd be breaking with a deeply-engrained traditional routine name (and if the documentation used the phrase "permutation" someone would probably still find the routine when searching for it. I'd still deprecate permutation(), but this would permit us to support permute(A) with the new meaning immediately rather than having to delay.

Reading some of the examples and notes made me wonder whether these routines can/do work with multidimensional domains / arrays (and whether they do their permutations in a dimension-independent way, or whether a permutation would permute rows and columns without completely scrambling everything; for arrays, I'd expect the former, but would want to know what other multidimensional permutations do).

The extra elements in the array are set to zero.

This seemed surprising to me. Zero might be a valid index of a given array (e.g., permute({0..#n});), so filling entries with zero doesn't seem well-motivated and could be confusing. Of course, with the new definition, I think there's no more need for sizes to match, so this concern goes away.

Based on performance and other considerations do you think one version is better than the other?

If the user happened to have an unused array that was the right size and distribution, passing in the array would save some time, but I'm not sure that use case is common enough to warrant complicating the interface at this point. We could always add an overload that took a domain and array later on if such cases arose in practice and were causing performance issues. This allows us to kick concerns like size mismatches, domain mismatches, different distributions down the road as well.

CaptainSharf · 2020-09-22T13:44:54Z

Hi, @bradcray,
The permutations() routine does not work for multi dimensional arrays in Matlab and Go. However in the case of numpy, the array is shuffled across the first dimension. For ex. in numpy

>>> x = np.arange(9).reshape(3,3)
>>> x
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> y = np.random.permutation(x)
>>> y
array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

CaptainSharf · 2020-09-22T13:55:26Z

However given that, I think it is possible to shuffle the contents of the array independent of the dimension. We could use the Fischer-Yates algorithm over the multi dimensional array by mapping an N-D array index to a 1-D index.

/* shuffles the array independent of the dimension */
proc shuffleDeep(arr:[?D]){
    for i in 0..arr.size by -1{
      var flatRandIdx = randlc_bounded(D.idxType, PCGRandomStreamPrivate_rngs,seed, PCGRandomStreamPrivate_count,0, i);
      var randIdx = getNdIndices(arr,flatRandIdx);
      var currIdx = getNdIndices(arr,i);
      arr[randIdx] <=> arr[currIdx];
    }
}

/* Transforms a flat index to an array index */
proc getNdIndices(arr:[], in flatInd){
    var idx : [0..arr.domain.rank-1] int = -1;
    var div = arr.size;
    for i in 0..arr.domain.rank-1 by -1{
        div/=arr.domain.dim(i).size;
        
        const lo = arr.domain.dim(i).low;
        const stride = abs(arr.domain.dim(i).stride);
        const zeroInd = flatInd/div;
        const currInd = lo+(zeroInd*stride);

        idx[i] = currInd;
        flatInd = flatInd%div;
    }
    return idx;
}

bradcray · 2020-09-22T14:48:14Z

However in the case of numpy, the array is shuffled across the first dimension.

I would say that this behavior isn't appropriate for Chapel or languages with true 2D arrays, and that it makes more sense for languages in which a 2D array is really a 1D array of 1D arrays (which, I would argue, the formatting of your Python example suggests it has).

And I agree with the approach of thinking of the domain's indices as being in a 1D / ordinal space, permuting there, then transforming back to the multidimensional space. Some existing routines (like, I think, indexToOrder and orderToIndex, though I may not have those names right, and don't have time to check right now) should help with this.

ben-albrecht · 2020-09-22T15:11:57Z

To deal with the deprecation issue, I'd be inclined to name this routine permute() rather than permutation() just because permute({1..n}) or permute(A) reads much more cleanly than permutation({1..n}) or permutation(A) to me (i.e., they seem "more like an English sentence"). Given that NumPy, Matlab, and Go don't share a name for this capability, it doesn't feel like we'd be breaking with a deeply-engrained traditional routine name (and if the documentation used the phrase "permutation" someone would probably still find the routine when searching for it. I'd still deprecate permutation(), but this would permit us to support permute(A) with the new meaning immediately rather than having to delay.

I'd be OK with permute() as the new out-of-place shuffle function. The switch from noun to verb raises broader naming convention questions for me, which I've brought up over in #6698 (comment).

@bradcray - I agree with your take on multi-dimensional shuffling. Further discussion on this might benefit from a separate issue so that we can keep this issue/task scoped to deprecating permutation() and introducing permute() for 1D arrays initially.

@CaptainSharf - FYI you can add start your code blocks with ```chpl to enable syntax highlighting.

CaptainSharf · 2020-09-23T13:55:37Z

Hi,
The methods orderToIndex and indexOrder seem to be defined for a range rather than a domain. Are there built-in methods that define a mapping from a domain index to an integer and an inverse from an integer to a domain index? Do similar built-in methods exist for associative domains?

bradcray · 2020-09-23T14:02:28Z

indexOrder seems to be defined on rectangular domains at least, though it is not documented: #16400 (comment)

I thought we had the reverse operation as well, but am not finding it with a quick look... (I think we should).

As far as I know, these are not supported on associative domains at present.

CaptainSharf · 2020-09-23T15:11:49Z

I found the method indexOrder defined in the documentation for domains. There's a method dsiIndexOrder in the DefaultRectangular.chpl module, that computes the order of an index. I'll keep searching for orderToIndex... I need orderToIndex for implementing the shuffle

CaptainSharf · 2020-10-01T16:16:53Z

Hi @ben-albrecht @bradcray ,
This issue is dependent on #14885. Would it be a good workaround if we had a custom implementation for orderToIndex in the Random module which is then used by the methods shuffle and permute ?

bradcray · 2020-10-01T16:41:49Z

If you have working copies of those routines, why not add them to the ChapelArray.chpl module to resolve #14885 rather than as a Random-local workaround?

CaptainSharf · 2021-01-29T02:21:13Z

Hi @bradcray , @ben-albrecht , @mppf ,
Now that we have the custom method orderToIndex which gets the index in multi-dimensional arrays for an order, I believe it would now be possible to shuffle multi-dimensional arrays as well. Would it be OK if I made this change in both the methods permute and shuffle?

mppf · 2021-01-29T15:06:53Z

I don't see any problem with adding that.

ben-albrecht · 2021-01-29T15:46:01Z

Sounds great, thanks @CaptainSharf

CaptainSharf · 2021-02-04T17:10:23Z

H!
I opened a PR #17053, for this issue

@jabraham17

Deprecates `randomStream.permutation` and the top level `permutation` procedures in favor of a new `permute` interface. The new methods have the following signatures: ```chapel proc permute(a: [?d] ?t]): [d] t { ... } proc permute(d: domain(1,?)): [] d.idxType { ... } proc permute(r: range(bounds=boundKind.both, ?)): [] r.idxType { ... } ``` In each case, an array with a random permutation of the argument is returned. This is different than the old interface, which overwrote the array argument with a permutation of the array's domain. Resolves: #16400 - [x] paratest [ reviewed by @jabraham17 ] - thanks!

bradcray · 2024-02-12T18:40:06Z

@jeremiah-corrado : Just noticing that this is closed now with your recent work. Would you check whether open PR #17053 which looks like it was filed for this issue has additional value or could/should be closed now?

jeremiah-corrado · 2024-02-13T14:50:53Z

The multidimensional shuffle/permute implementation in that PR came from a somewhat tangential discussion in this issue. The original topic was only about changing the permutation api to return an array, so I'll open a separate issue to discuss whether we want to expand some of the random procedures to support multidimensional arrays.

The PR itself is pretty out of date wrt what we have on main at this point. I think it can be closed but referenced from the issue.

bradcray added area: Libraries / Modules type: Design user issue labels Sep 16, 2020

CaptainSharf mentioned this issue Sep 18, 2020

Modifying permutation method in Random module #16417

Closed

ben-albrecht mentioned this issue Sep 22, 2020

Chapel standard module style guide (collecting input) #6698

Open

CaptainSharf mentioned this issue Apr 9, 2021

Adding support for multi-dimensional arrays to the procedures permutation and shuffle #17053

Closed

jeremiah-corrado mentioned this issue Nov 1, 2023

Random module deprecations and stabilization improvements #23752

Merged

4 tasks

jeremiah-corrado mentioned this issue Jan 24, 2024

Deprecate Random permutation in favor of permute #24268

Merged

1 task

jeremiah-corrado closed this as completed in #24268 Jan 30, 2024

jeremiah-corrado mentioned this issue Feb 13, 2024

Should all Random routines support multidimensional arrays? #24406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

permutation method in Random module overwrites the input array with permuted indices of the domain #16400

permutation method in Random module overwrites the input array with permuted indices of the domain #16400

CaptainSharf commented Sep 16, 2020

bradcray commented Sep 16, 2020

mppf commented Sep 16, 2020

CaptainSharf commented Sep 18, 2020

CaptainSharf commented Sep 20, 2020 •

edited

Loading

CaptainSharf commented Sep 20, 2020 •

edited

Loading

ben-albrecht commented Sep 21, 2020 •

edited

Loading

ben-albrecht commented Sep 21, 2020

CaptainSharf commented Sep 21, 2020

ben-albrecht commented Sep 21, 2020

CaptainSharf commented Sep 21, 2020

mppf commented Sep 21, 2020

bradcray commented Sep 21, 2020

CaptainSharf commented Sep 22, 2020

CaptainSharf commented Sep 22, 2020 •

edited

Loading

bradcray commented Sep 22, 2020

ben-albrecht commented Sep 22, 2020 •

edited

Loading

CaptainSharf commented Sep 23, 2020

bradcray commented Sep 23, 2020

CaptainSharf commented Sep 23, 2020

CaptainSharf commented Oct 1, 2020

bradcray commented Oct 1, 2020

CaptainSharf commented Jan 29, 2021

mppf commented Jan 29, 2021

ben-albrecht commented Jan 29, 2021

CaptainSharf commented Feb 4, 2021

bradcray commented Feb 12, 2024

jeremiah-corrado commented Feb 13, 2024

permutation method in Random module overwrites the input array with permuted indices of the domain #16400

permutation method in Random module overwrites the input array with permuted indices of the domain #16400

Comments

CaptainSharf commented Sep 16, 2020

Summary of Problem

Steps to Reproduce

Configuration Information

bradcray commented Sep 16, 2020

mppf commented Sep 16, 2020

CaptainSharf commented Sep 18, 2020

CaptainSharf commented Sep 20, 2020 • edited Loading

CaptainSharf commented Sep 20, 2020 • edited Loading

ben-albrecht commented Sep 21, 2020 • edited Loading

ben-albrecht commented Sep 21, 2020

CaptainSharf commented Sep 21, 2020

ben-albrecht commented Sep 21, 2020

CaptainSharf commented Sep 21, 2020

mppf commented Sep 21, 2020

bradcray commented Sep 21, 2020

CaptainSharf commented Sep 22, 2020

CaptainSharf commented Sep 22, 2020 • edited Loading

bradcray commented Sep 22, 2020

ben-albrecht commented Sep 22, 2020 • edited Loading

CaptainSharf commented Sep 23, 2020

bradcray commented Sep 23, 2020

CaptainSharf commented Sep 23, 2020

CaptainSharf commented Oct 1, 2020

bradcray commented Oct 1, 2020

CaptainSharf commented Jan 29, 2021

mppf commented Jan 29, 2021

ben-albrecht commented Jan 29, 2021

CaptainSharf commented Feb 4, 2021

bradcray commented Feb 12, 2024

jeremiah-corrado commented Feb 13, 2024

CaptainSharf commented Sep 20, 2020 •

edited

Loading

CaptainSharf commented Sep 20, 2020 •

edited

Loading

ben-albrecht commented Sep 21, 2020 •

edited

Loading

CaptainSharf commented Sep 22, 2020 •

edited

Loading

ben-albrecht commented Sep 22, 2020 •

edited

Loading