Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[1.4.0] unravel_index no longer works with magic '-1' in shape parameter as in 1.3.1 #13862

Closed
fhieber opened this issue Jan 12, 2019 · 7 comments · Fixed by #14356
Closed

[1.4.0] unravel_index no longer works with magic '-1' in shape parameter as in 1.3.1 #13862

fhieber opened this issue Jan 12, 2019 · 7 comments · Fixed by #14356

Comments

@fhieber
Copy link
Contributor

fhieber commented Jan 12, 2019

Description

The unravel_index op seems to no longer correctly work with 'magic' shape values, such as '-1's. The following example still works with mxnet 1.3.1, but does not on latest master (it returns all zeros in the result without throwing an error) or 1.4.0.
We have a use case for this in Sockeye.

Environment info (Required)

macOs

MXNet commit hash:
pip-installed:
mxnet 1.5.0b20190111

Minimum reproducible example

Input data taken from Sockeye unit tests.

x = mx.nd.array([335, 620, 593, 219, 36], dtype='int32')
mx.nd.unravel_index(x, shape=(-1, 200))

With mxnet==1.5.0b20190111, the result is incorrect:

[[  0   0   0   0   0]
 [135  20 193  19  36]]
<NDArray 2x5 @cpu(0)>

With mxnet==1.3.1, the result is correct:

[[  1   3   2   1   0]
 [135  20 193  19  36]]
<NDArray 2x5 @cpu(0)>

However, if the shape parameter is fully specified (shape=(5,200)), mxnet==1.5.0b20190111 returns the correct values.

@piyushghai
Copy link
Contributor

Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it. @mxnet-label-bot add [Python, Bug, NDArray]

@fhieber
Copy link
Contributor Author

fhieber commented Jan 15, 2019

Could someone test this also with the 1.4 release candidate? If this is present there as well, I'd appreciate a last-minute fix very much! :)

@fhieber
Copy link
Contributor Author

fhieber commented Jan 27, 2019

Hoping to get some activity on this.
I bisected the nightly builds to figure out, when this bug was introduced.

  • mxnet==1.3.1b20181005 is fine
  • mxnet==1.3.1b20181010 contains the bug
    Assuming that builds are roughly reflecting merge dates of PRs, the bug must be introduced some time between October 5th and October 10th, and is hence also part of 1.4.0.

There is no code change in the ravel.* files or the unravel_index op itself since its addition, so I would guess that some other type of change caused this unwanted regression; maybe unravel_index uses some outdated interface?

The only commits where I could guess that they could have some effect on this:

Unfortunately I don't have a way to bisect commits directly through source compilation right now.

I would appreciate some support on this. Thanks!

@fhieber fhieber changed the title [Bug] unravel_index no longer works with magic '-1' in shape parameter after 1.3.1 1.4.0 unravel_index no longer works with magic '-1' in shape parameter after 1.3.1 Mar 4, 2019
@fhieber fhieber changed the title 1.4.0 unravel_index no longer works with magic '-1' in shape parameter after 1.3.1 [1.4.0] unravel_index no longer works with magic '-1' in shape parameter as in 1.3.1 Mar 4, 2019
@fhieber
Copy link
Contributor Author

fhieber commented Mar 4, 2019

I can confirm that the problem exists in 1.4.0

@asmushetzel
Copy link
Contributor

Pretty easy to diagnose.

The type for mshadow::index_t got changed from "unsigned" to "int64_t" between both versions. This type is used to encode dimensions of shapes internally.
In 1.3.1, a "-1" therefore was interpreted as unsigned(-1) which is basically max_int. In 1.4.0 and later it is interpreted as "-1" and that changed the behaviour of the operator.

It is debatable whether we ever wanted to explicitly allow the use of "-1" in the shape argument of the operator. At least I as the original author had this not in mind and also the documentation does not say anything about magic numbers. In fact, the only case where the operator did work (and could ever work) with magic numbers is when "-1" is specified as first coordinate. Nothing else is possible.
Two solutions:

  • We document that it is o.k. to use "-1" as first shape coordinate in ravel/unravel_index. And enhance the operator to support this. That does not require any change for "ravel" and a simple change in the code shown below.
  • We leave it as is (the code simply does not support this). A potential workaround for customers would be to specify a huge number except "-1" for the first dimension which has essentially the same effect.
struct unravel_index {
  template<typename DType>
  MSHADOW_XINLINE static void Map(int i, index_t N, index_t ndim, index_t *shape,
                                  DType *unravelled, DType *ravelled) {
    index_t idx(ravelled[i]);
    #pragma unroll
    for (int j = ndim; j--; ) {
      index_t tmp = idx / shape[j];
      unravelled[i+j*N] = idx - tmp*shape[j];
      idx = tmp;
    }
  }
};

@fhieber
Copy link
Contributor Author

fhieber commented Mar 6, 2019

Oh thank you so much! This is quite hilarious - we relied on exploiting this undocumented behavior without knowing :)
For Sockeye this means that we can update to 1.4.0 immediately and make this exploit of range overflow explicit by using shape=(sys.maxsize, y) in our code instead of shape=(-1, y). We rely on this functionality because we cannot infer the first shape value (batch size) in a HybridBlock in our case. As such, I would vote for going with option 1, explicitly supporting it in the future.

@asmushetzel
Copy link
Contributor

PR #14356 will fix this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants