[Bug]: cannot chunk a DataArray that originated as a coordinate #6204

spencerkclark · 2022-01-28T15:56:44Z

What happened?

If I construct the following DataArray, and try to chunk its "x" coordinate, I get back a NumPy-backed DataArray:

In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk()
Out[3]:
<xarray.DataArray 'x' (x: 3)>
array([4, 5, 6])
Coordinates:
  * x        (x) int64 4 5 6

If I construct a copy of the "x" coordinate, things work as I would expect:

In [4]: x = xr.DataArray(a.x, dims=a.x.dims, coords=a.x.coords, name="x")

In [5]: x.chunk()
Out[5]:
<xarray.DataArray 'x' (x: 3)>
dask.array<xarray-<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) int64 4 5 6

What did you expect to happen?

I would expect the following to happen:

In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk()
Out[3]:
<xarray.DataArray 'x' (x: 3)>
dask.array<xarray-<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) int64 4 5 6

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12)
[Clang 11.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.5
libnetcdf: 4.6.3

xarray: 0.20.1
pandas: 1.3.5
numpy: 1.19.4
scipy: 1.5.4
netCDF4: 1.5.5
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.7.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.22.0
distributed: None
matplotlib: 3.2.2
cartopy: 0.19.0.post1
seaborn: None
numbagg: None
fsspec: 2021.06.0
cupy: None
pint: 0.15
sparse: None
setuptools: 49.6.0.post20210108
pip: 20.2.4
conda: 4.10.1
pytest: 6.0.1
IPython: 7.27.0
sphinx: 3.2.1

The text was updated successfully, but these errors were encountered:

dcherian · 2022-03-16T04:18:46Z

I've run in to this before. The underlying variable object is IndexVariable which has a dummy chunk method

xarray/xarray/core/variable.py

Lines 2707 to 2709 in 95bb9ae

    
           def chunk(self, chunks={}, name=None, lock=False): 
        
               # Dummy - do not chunk. This method is invoked e.g. by Dataset.chunk() 
        
               return self.copy(deep=False)

…#1542)   ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #1536 - [x] Tests for the changes have been added (for bug fixes / features) - [ ] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGES.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? * New function `xc.core.utils._chunk_like` to chunk a list of inputs according to one chunk dictionary. It also circumvents pydata/xarray#6204 by recreating DataArrays that where obtained from dimension coordinates. * Generalization of `uses_dask` so it can accept a list of objects. * Usage of `_chunk_like` to ensure the inputs of `cosine_of_solar_zenith_angle` are chunked when needed, in `mean_radiant_temperature` and `potential_evapotranspiration`. The effect of this is simply that the `cosine_of_solar_zenith_angle` will be performed on blocks of the same size as in the original data, even though its inputs (the dimension coordinate) did not carry that information. Before this PR, the calculation was done as a single block, of the same size as the full array. ### Does this PR introduce a breaking change? No. ### Other information: Dask might warn something like `PerformanceWarning: Increasing number of chunks by factor of NN`. Where NN should be the number of chunks along the `lat` dimension, if present. That exactly what we want, so it's ok.

sanghyukmoon · 2025-02-13T18:11:18Z

I encountered a similar issue when I tried to create a new array from a set of Index Coordinates, e.g.,

# ds.z, ds.y, ds.x are fully loaded into memory, and so is ds.coords.r.
ds.coords['r'] = np.sqrt((ds.z - z0)**2 + (ds.y - y0)**2 + (ds.x - x0)**2)

I'm currently circumventing the problem by using _chunk_like introduced in Ouranosinc/xclim#1542.

x, y, z = _chunk_like(ds.x, ds.y, ds.z, chunks=ds.chunksizes)
ds.coords['r'] = np.sqrt((z - z0)**2 + (y - y0)**2 + (x - x0)**2)
# Now, ds.coords.r carries a dask array!

As @dcherian noted, the underlying cause of the issue seems to be that the IndexVariable always have to be fully loaded into memory.

spencerkclark added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 28, 2022

dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 16, 2022

dcherian mentioned this issue May 2, 2022

New inline_array kwarg for open_dataset #6566

Merged

3 tasks

aulemahal mentioned this issue Nov 29, 2023

Ensure inputs of cosine of solar zenith angle are chunked when needed Ouranosinc/xclim#1542

Merged

5 tasks

accowa mentioned this issue Sep 23, 2024

Unwarranted warning NOC-OI/msm-os#11

Closed

soutobias mentioned this issue Sep 23, 2024

The `time_counter' variable is not being chunked split with the given chunk strategy NOC-OI/msm-os#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: cannot chunk a DataArray that originated as a coordinate #6204

[Bug]: cannot chunk a DataArray that originated as a coordinate #6204

spencerkclark commented Jan 28, 2022

dcherian commented Mar 16, 2022

sanghyukmoon commented Feb 13, 2025

[Bug]: cannot chunk a DataArray that originated as a coordinate #6204

[Bug]: cannot chunk a DataArray that originated as a coordinate #6204

Comments

spencerkclark commented Jan 28, 2022

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

dcherian commented Mar 16, 2022

sanghyukmoon commented Feb 13, 2025