-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing terminologies and some errors in the official documentation #6866
Comments
Hi @v-liuwei, thanks for the report. The issues that your are pointing are part of #6293. There has been many internal changes (+ some subtle public-facing changes) regarding indexes in the last release, but there is still some work for reflecting it in the documentation.
I agree, this has always been a source of confusion IMO. Xarray's data model has been updated in the last release such that these two concepts are now different and independent (i.e., it allows a non-dimension coordinate to have an index).
This is because multi-index levels now have each their own, real coordinate (the documentation is not yet up-to-date). However, I agree that using the same symbol for multi-coordinate indexes may not be ideal as it is hard to distinguish which coordinate is associated with which index. On the other hand, using two different symbols wouldn't be an elegant solution either if we later depreciate the multi-index dimension coordinate (i.e., |
Thanks for your explanations. You said that "it allows a non-dimension coordinate to have an index", which confuses me even more. I want to confirm that, should we always(or is it only possible to) use the index coordinates to index the DataArray/Dataset in a label fasion? |
Yes, performing selection using coordinate labels (i.e., Before v2022.6.0, only 1-dimensional coordinates with the name matching the dimension name could have a pandas index or multi-index. Hence the distinction between a "dimension coordinate" which most often implicitly wrapped a pandas index and a "non-dimension" coordinate for which label-based selection was impossible. Starting from v2022.6.0, this constraint is relaxed. Although it is not yet fully operational, any coordinate or any group of coordinates (with arbitrary dimensions) may now have an index (either pandas-based or any xarray compatible custom index) and may therefore be used for label-based selection (if the index supports it). |
What happened?
To note, I'm using the stable version(2022.6.0).
First, I'm confused that both
dimension coordinate
/non-dimension coordinate
andindex coordinate
/non-index coordinate
appear in the documentation(search to see), but they seem to be the same thing.Second, I found that there are some errors in the documentation:
It says that "The index associated with dimension name x can be retrieved by arr.indexes[x]. By construction,
len(arr.dims) == len(arr.indexes)
", which is inconsistent with actual behavior. See example code below:It seems that
arr.indexes
only returns indexes of dimensions that have coordinates. However, it's possible to get the index ofdimension
y
throughget_index()
:It says that: (see link)
As you can see, even in the given example code offered by the offical, all the "virtual" coordinates are marked as
*
instead of-
, which is a little bit confusing when handling multi-index coordinates in my experience.May I have missed something? Thanks in advance for the reply.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
No response
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.10.102.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.1
scipy: 1.3.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 45.2.0
pip: 22.2.1
conda: None
pytest: None
IPython: 7.13.0
sphinx: None
The text was updated successfully, but these errors were encountered: