Improved inference of names when concatenating arrays #2775

Zac-HD · 2019-02-19T04:01:03Z

Problem description

Using the name of the first element to concatenate as the name of the concatenated array is only correct if all names are identical. When names vary, using a clear placeholder name or the name of the new dimension would avoid misleading data users.

This came up for me recently when stacking several bands of a satellite image to produce a faceted plot - the resulting colorbar was labelled "blue", even though that was clearly incorrect.

A similar process is probably also desirable for aggregation of units across concatenated arrays - use first if identical, otherwise discard or error depending on the compat argument.

Code Sample, a copy-pastable example if possible

ds = xr.Dataset({
    k: xr.DataArray(np.random.random((2, 2)), dims="x y".split(), name=k) 
    for k in "blue green red".split()
})
# arr.name == "blue", could be "band" or "concat_dim"
arr = xr.concat([ds.blue, ds.green, ds.red], dim="band")
# label of colorbar is "blue", which is meaningless
arr.plot.imshow(col="band")

One implementation that would certainly be nice for this use-case (though perhaps not generally) is that concatenating DataArrays along an entirely new dimension with unique array names and dim passed a string could create a new Index as well, as pd.Index([a.name for a in objs], name=dim).

INSTALLED VERSIONS

commit: None
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
libhdf5: 1.10.3
libnetcdf: 4.4.1.1

xarray: 0.11.2
pandas: 0.23.1
numpy: 1.14.5
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.6.2
pip: 10.0.1
conda: None
pytest: 4.2.0
IPython: 6.4.0
sphinx: 1.8.0

I'd be happy to write a PR for this if it would be accepted.

The text was updated successfully, but these errors were encountered:

shoyer · 2019-02-19T04:20:09Z

Indeed, this seems broken to me. I think we should use the same heuristic we use for naming the result of operations with apply_ufunc:

xarray/xarray/core/computation.py

Lines 119 to 128 in cd8e370

    
           def result_name(objects: list) -> Any: 
        
               # use the same naming heuristics as pandas: 
        
               # https://github.com/blaze/blaze/issues/458#issuecomment-51936356 
        
               names = {getattr(obj, 'name', _DEFAULT_NAME) for obj in objects} 
        
               names.discard(_DEFAULT_NAME) 
        
               if len(names) == 1: 
        
                   name, = names 
        
               else: 
        
                   name = None 
        
               return name

Zac-HD mentioned this issue Feb 19, 2019

Improved default behavior when concatenating DataArrays #2777

Closed

3 tasks

TomNicholas mentioned this issue Feb 27, 2019

Improve name concat #2792

Merged

3 tasks

shoyer closed this as completed in #2792 Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved inference of names when concatenating arrays #2775

Improved inference of names when concatenating arrays #2775

Zac-HD commented Feb 19, 2019

INSTALLED VERSIONS

shoyer commented Feb 19, 2019

Improved inference of names when concatenating arrays #2775

Improved inference of names when concatenating arrays #2775

Comments

Zac-HD commented Feb 19, 2019

Problem description

Code Sample, a copy-pastable example if possible

INSTALLED VERSIONS

shoyer commented Feb 19, 2019