Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray creation prone to errors when data shares dimension shapes #727

Closed
choldgraf opened this issue Jan 26, 2016 · 10 comments · Fixed by #993
Closed

DataArray creation prone to errors when data shares dimension shapes #727

choldgraf opened this issue Jan 26, 2016 · 10 comments · Fixed by #993

Comments

@choldgraf
Copy link

It seems like there would be unexpected behavior whenever someone creates a DataArray using data with some subset of dimensions that have the same shape, and when supplying coords as a dictionary of coord_name: coords pairs. Since the dictionary isn't ordered, won't it be unclear which dimension it is referring to?

Maybe that's not such a good description, here's an example:

data = np.random.randn(20, 5, 5)
da = xarray.DataArray(data, {'1': np.arange(20),
                             '2': np.arange(15, 20),
                             '3': np.arange(5, 10)})
print(da.coords)

da = xarray.DataArray(data, {'10': np.arange(20),
                             '11': np.arange(15, 20),
                             '12': np.arange(5, 10)})
print(da.coords)

For the first dimension it's no problem, since it's the only one with length 20. For the 2nd and 3rd dimensions since they are the same shape, the coordinates will be assigned depending on which order python takes the keys, which means they could sometimes be flipped depending on something arbitrary like what you named a dimension.

It seems like this could be addressed by doing a quick check (when supplying coordinates in this fashion) to see if any of the dimensions of the coordinates are the same, and throwing a warning that behavior might be unstable. Or maybe it throws an error and says "if data coordinates are same length, please supply coordinate values in a tuple or list instead"?

@shoyer
Copy link
Member

shoyer commented Jan 27, 2016

I agree, this can be ambiguous. It makes sense for ordered mappings, but not for normal dicts. And unfortunately there is no Pythonic way to tell whether a mapping is ordered or not.

I originally included it because I wanted to make sure operations like xr.DataArray(2 * x.data, x.coords) preserve all original dimensions in order. But, given that we define DataArray.coords.dims and use that to fill in an empty dims argument, we don't really need this anymore. If someone really likes supplying coordinates as an OrderedDict, it's simple enough for them to write coords.items() instead.

So I would support deprecating supplying dict-like coordinates without explicitly provided dims. It's more magic than we really need. This line is in the relevant code path.

I don't want to simply error when coordinates have ambiguous lengths compared to the size of the data, because adding errors in special cases makes things harder to predict.

@jhamman
Copy link
Member

jhamman commented Jan 27, 2016

So I would support deprecating supplying dict-like coordinates without explicitly provided dims. It's more magic than we really need. This line is in the relevant code path.

I've also run into this and I'm 👍 on this solution.

@choldgraf
Copy link
Author

That sounds like a fair plan to me - throwing errors etc definitely
complicates things more than just deprecating the option :)

On Tue, Jan 26, 2016 at 8:58 PM, Joe Hamman notifications@github.com
wrote:

So I would support deprecating supplying dict-like coordinates without
explicitly provided dims. It's more magic than we really need. This line is
in the relevant code path.

I've also run into this and I'm [image: 👍] on this solution.


Reply to this email directly or view it on GitHub
#727 (comment).

@shoyer
Copy link
Member

shoyer commented Jan 27, 2016

Any interest in putting together a PR? :)

On Tue, Jan 26, 2016 at 9:46 PM, Chris Holdgraf notifications@github.com
wrote:

That sounds like a fair plan to me - throwing errors etc definitely
complicates things more than just deprecating the option :)

On Tue, Jan 26, 2016 at 8:58 PM, Joe Hamman notifications@github.com
wrote:

So I would support deprecating supplying dict-like coordinates without
explicitly provided dims. It's more magic than we really need. This line
is
in the relevant code path.

I've also run into this and I'm [image: 👍] on this solution.


Reply to this email directly or view it on GitHub
#727 (comment).


Reply to this email directly or view it on GitHub
#727 (comment).

@choldgraf
Copy link
Author

Lemme take a look at the code tomorrow. I'm pretty new to xarray but I've
been meaning to learn it a bit more :-)
On Jan 26, 2016 9:51 PM, "Stephan Hoyer" notifications@github.com wrote:

Any interest in putting together a PR? :)

On Tue, Jan 26, 2016 at 9:46 PM, Chris Holdgraf notifications@github.com
wrote:

That sounds like a fair plan to me - throwing errors etc definitely
complicates things more than just deprecating the option :)

On Tue, Jan 26, 2016 at 8:58 PM, Joe Hamman notifications@github.com
wrote:

So I would support deprecating supplying dict-like coordinates without
explicitly provided dims. It's more magic than we really need. This
line
is
in the relevant code path.

I've also run into this and I'm [image: 👍] on this solution.


Reply to this email directly or view it on GitHub
#727 (comment).


Reply to this email directly or view it on GitHub
#727 (comment).


Reply to this email directly or view it on GitHub
#727 (comment).

@choldgraf
Copy link
Author

So it looks like the relevant code is somewhere around here, no? Is "auto-coords" done in many other places in the codebase?

And as far as future behavior, do you imagine just changing the utils.is_dict_like bits so that instead of doing magical stuff under the hood, they just throw an error from now on?

@shoyer
Copy link
Member

shoyer commented Jan 27, 2016

@choldgraf Yes, that looks right to me -- this code path in particular should trigger a DeprecationWarning:

            # try to infer dimensions from coords
            if utils.is_dict_like(coords):
                dims = list(coords.keys())

@choldgraf
Copy link
Author

Cool - I'll make a PR in the next week or two...I know this isn't a big change but have a few deadlines in the coming days.

@choldgraf
Copy link
Author

whoops - that's what I get for not putting this on my to-do list...totally forgot about this issue. My bad, but thanks for fixing!

@shoyer
Copy link
Member

shoyer commented Sep 1, 2016

@choldgraf no worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants