Skip to content

Why are da.chunks and ds.chunks properties inconsistent? #5843

Closed
@TomNicholas

Description

@TomNicholas

Basically the title, but what I'm referring to is this:

In [2]: da = xr.DataArray([[0, 1], [2, 3]], name='foo').chunk(1)

In [3]: ds = da.to_dataset()

In [4]: da.chunks
Out[4]: ((1, 1), (1, 1))

In [5]: ds.chunks
Out[5]: Frozen({'dim_0': (1, 1), 'dim_1': (1, 1)})

Why does DataArray.chunks return a tuple and Dataset.chunks return a frozen dictionary?

This seems a bit silly, for a few reasons:

  1. it means that some perfectly reasonable code might fail unnecessarily if passed a DataArray instead of a Dataset or vice versa, such as

    def is_core_dim_chunked(obj, core_dim):
        return len(obj.chunks[core_dim]) > 1

    which will work as intended for a dataset but raises a TypeError for a dataarray.

  2. it breaks the pattern we use for .sizes, where

    In [14]: da.sizes
    Out[14]: Frozen({'dim_0': 2, 'dim_1': 2})
    
    In [15]: ds.sizes
    Out[15]: Frozen({'dim_0': 2, 'dim_1': 2})
  3. if you want the chunks as a tuple they are always accessible via da.data.chunks, which is a more sensible place to look to find the chunks without dimension names.

  4. It's an undocumented difference, as the docstrings for ds.chunks and da.chunks both only say

    """Block dimensions for this dataset’s data or None if it’s not a dask array."""

    which doesn't tell me anything about the return type, or warn me that the return types are different.

    EDIT: In fact DataArray.chunk doesn't even appear to be listed on the API docs page at all.

In our codebase this difference is mostly washed out by us using ._to_temp_dataset() all the time, and also by the way that the .chunk() method accepts both the tuple and dict form, so both of these invariants hold (but in different ways):

ds == ds.chunk(ds.chunks)
da == da.chunk(da.chunks)

I'm not sure whether making this consistent is worth the effort of a significant breaking change though 😕

(Sort of related to #2103)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions