Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose "Coordinates" as part of Xarray's public API #7368

Merged
merged 76 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
41f4fd8
add indexes argument to Dataset.__init__
benbovy Oct 25, 2022
4baa8af
make indexes arg public for DataArray.__init__
benbovy Oct 25, 2022
dbc058a
Indexes constructor updates
benbovy Oct 26, 2022
16a9983
use the generic Mapping[Any, Index] for indexes
benbovy Oct 26, 2022
3c076d5
add wrap_pandas_multiindex function
benbovy Oct 26, 2022
70e7a5d
do not create default indexes when not desired
benbovy Oct 26, 2022
00e1766
fix Dataset dimensions
benbovy Oct 26, 2022
3bf92cd
copy the coordinate variables of passed indexes
benbovy Oct 26, 2022
c9b6363
DataArray: check dimensions/shape of index coords
benbovy Oct 26, 2022
82dc5cc
docstrings tweaks
benbovy Oct 27, 2022
a58c9d0
more Indexes safety
benbovy Oct 27, 2022
9beeea7
ensure input indexes are Xarray indexes
benbovy Oct 27, 2022
c6e94b4
add .assign_indexes() method
benbovy Oct 27, 2022
ddd505e
Merge branch 'main' into indexes-arg-constructors
benbovy Dec 8, 2022
f97adb5
add `IndexedCoordinates` subclass
benbovy Dec 8, 2022
45709ef
rollback/update Dataset and DataArray constructors
benbovy Dec 8, 2022
4c559f1
update docstrings
benbovy Dec 8, 2022
1192948
fix Dataset creation internal error
benbovy Dec 8, 2022
a877a74
add IndexedCoordinates.merge_coords
benbovy Dec 9, 2022
9d6d2ae
drop IndexedCoordinates and reuse Coordinates
benbovy Dec 12, 2022
3ee26ef
update api docs
benbovy Dec 12, 2022
dd02eca
make Coordinates init args optional
benbovy Dec 12, 2022
0ee8f95
docstrings updates
benbovy Dec 12, 2022
fc6c948
convert to base variable when no index is given
benbovy Dec 12, 2022
0572b96
raise when an index is given with no variable
benbovy Dec 12, 2022
6f5114b
skip create default indexes...
benbovy Dec 12, 2022
e27830a
invariant checks: maybe skip IndexVariable checks
benbovy Dec 12, 2022
1649fb8
add Coordinates tests
benbovy Dec 12, 2022
298fccd
more Coordinates tests
benbovy Dec 12, 2022
e8c627c
add Dataset constructor tests with Coordinates
benbovy Dec 12, 2022
be86f87
fix mypy
benbovy Dec 12, 2022
75e2523
assign_coords: do not create default indexes...
benbovy Dec 12, 2022
82f0fb2
support alignment of Coordinates
benbovy Dec 12, 2022
883e67c
clean-up
benbovy Dec 12, 2022
28e9861
fix failing test (dataarray coords not extracted)
benbovy Dec 12, 2022
9a209a3
fix tests: prevent index conflicts
benbovy Dec 12, 2022
4f337e3
add Coordinates.equals and Coordinates.identical
benbovy Dec 13, 2022
43ddcf6
more tests, docstrings, docs
benbovy Dec 13, 2022
2437456
fix assert_* (Coordinates subclasses)
benbovy Dec 13, 2022
e60570f
review copy
benbovy Dec 13, 2022
d01cf01
another few tests
benbovy Dec 13, 2022
9fc49ff
fix mypy
benbovy Dec 13, 2022
7873c77
update what's new
benbovy Dec 13, 2022
e7998d1
Merge branch 'main' into indexes-arg-constructors-2
benbovy Dec 13, 2022
f7ec33e
do not copy indexes
benbovy Dec 13, 2022
b1a9688
add Coordinates fastpath constructor
benbovy Dec 14, 2022
38fdf1e
fix sphinx directive
benbovy Dec 14, 2022
d9e9e34
re-add coord indexes in merge (dataset constructor)
benbovy Dec 14, 2022
3999eff
create coords with default idx: try a cleaner impl
benbovy Dec 14, 2022
d5d8233
some useful comments for later
benbovy Dec 14, 2022
d2fcaa3
xr.merge: add support for Coordinates objects
benbovy Dec 14, 2022
193dad3
allow skip align for object(s) in merge_core
benbovy Dec 15, 2022
84c77a4
fix mypy
benbovy Dec 15, 2022
5e82d61
what's new tweaks
benbovy Dec 15, 2022
c6409fd
align Coordinates callbacks: don't reindex data vars
benbovy Dec 15, 2022
39294fc
fix Coordinates._overwrite_indexes callback
benbovy Dec 15, 2022
3fc1e8c
Merge branch 'main' into indexes-arg-constructors-2
benbovy Jan 13, 2023
8c65f85
remove merge_coords
benbovy Jan 13, 2023
cf6fcbb
futurewarning: pass multi-index via data vars
benbovy Jan 13, 2023
6a6444f
review comments
benbovy Jan 13, 2023
50cf057
Merge branch 'main' into indexes-arg-constructors-2
benbovy Jan 13, 2023
f5d1fe1
Merge branch 'main' into pr/7368
Illviljan Jul 14, 2023
1759ac9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2023
48f6950
Fix circulat imports
Illviljan Jul 14, 2023
a789f6b
Merge branch 'indexes-arg-constructors-2' of https://github.com/benbo…
Illviljan Jul 14, 2023
fa384f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2023
7628cb2
typing: add Alignable protocol class
benbovy Jul 17, 2023
c8821f9
try fixing mypy error (Self redefinition)
benbovy Jul 17, 2023
c71aadb
remove Coordinate alias of Variable
benbovy Jul 17, 2023
139b13a
fix groupby test
benbovy Jul 17, 2023
7ed6279
doc: remove merge_coords in api reference
benbovy Jul 18, 2023
3d94357
doc: improve docstrings and glossary
benbovy Jul 18, 2023
4a6e915
use Self type annotation in Coordinate class
benbovy Jul 18, 2023
31f66b4
better comment
benbovy Jul 18, 2023
4cb70d0
fix Self undefined error with python < 3.11
benbovy Jul 18, 2023
4ef5f17
Merge branch 'main' into indexes-arg-constructors-2
dcherian Jul 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 38 additions & 10 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,40 @@
.. autosummary::
:toctree: generated/

Coordinates.from_pandas_multiindex
Coordinates.get
Coordinates.items
Coordinates.keys
Coordinates.values
Coordinates.dims
Coordinates.dtypes
Coordinates.variables
Coordinates.xindexes
Coordinates.indexes
Coordinates.to_dataset
Coordinates.to_index
Coordinates.update
Coordinates.merge
Coordinates.copy
Coordinates.equals
Coordinates.identical

core.coordinates.DatasetCoordinates.get
core.coordinates.DatasetCoordinates.items
core.coordinates.DatasetCoordinates.keys
core.coordinates.DatasetCoordinates.merge
core.coordinates.DatasetCoordinates.to_dataset
core.coordinates.DatasetCoordinates.to_index
core.coordinates.DatasetCoordinates.update
core.coordinates.DatasetCoordinates.values
core.coordinates.DatasetCoordinates.dims
core.coordinates.DatasetCoordinates.indexes
core.coordinates.DatasetCoordinates.dtypes
core.coordinates.DatasetCoordinates.variables
core.coordinates.DatasetCoordinates.xindexes
core.coordinates.DatasetCoordinates.indexes
core.coordinates.DatasetCoordinates.to_dataset
core.coordinates.DatasetCoordinates.to_index
core.coordinates.DatasetCoordinates.update
core.coordinates.DatasetCoordinates.merge
core.coordinates.DataArrayCoordinates.copy
core.coordinates.DatasetCoordinates.equals
core.coordinates.DatasetCoordinates.identical

core.rolling.DatasetCoarsen.boundary
core.rolling.DatasetCoarsen.coord_func
Expand Down Expand Up @@ -47,14 +70,19 @@
core.coordinates.DataArrayCoordinates.get
core.coordinates.DataArrayCoordinates.items
core.coordinates.DataArrayCoordinates.keys
core.coordinates.DataArrayCoordinates.merge
core.coordinates.DataArrayCoordinates.to_dataset
core.coordinates.DataArrayCoordinates.to_index
core.coordinates.DataArrayCoordinates.update
core.coordinates.DataArrayCoordinates.values
core.coordinates.DataArrayCoordinates.dims
core.coordinates.DataArrayCoordinates.indexes
core.coordinates.DataArrayCoordinates.dtypes
core.coordinates.DataArrayCoordinates.variables
core.coordinates.DataArrayCoordinates.xindexes
core.coordinates.DataArrayCoordinates.indexes
core.coordinates.DataArrayCoordinates.to_dataset
core.coordinates.DataArrayCoordinates.to_index
core.coordinates.DataArrayCoordinates.update
core.coordinates.DataArrayCoordinates.merge
core.coordinates.DataArrayCoordinates.copy
core.coordinates.DataArrayCoordinates.equals
core.coordinates.DataArrayCoordinates.identical

core.rolling.DataArrayCoarsen.boundary
core.rolling.DataArrayCoarsen.coord_func
Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1085,6 +1085,7 @@ Advanced API
.. autosummary::
:toctree: generated/

Coordinates
Dataset.variables
DataArray.variable
Variable
Expand Down
69 changes: 44 additions & 25 deletions doc/user-guide/terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,23 +54,22 @@ complete examples, please consult the relevant documentation.*
Coordinate
An array that labels a dimension or set of dimensions of another
``DataArray``. In the usual one-dimensional case, the coordinate array's
values can loosely be thought of as tick labels along a dimension. There
are two types of coordinate arrays: *dimension coordinates* and
*non-dimension coordinates* (see below). A coordinate named ``x`` can be
retrieved from ``arr.coords[x]``. A ``DataArray`` can have more
coordinates than dimensions because a single dimension can be labeled by
multiple coordinate arrays. However, only one coordinate array can be a
assigned as a particular dimension's dimension coordinate array. As a
values can loosely be thought of as tick labels along a dimension. We
distinguish :term:`Dimension coordinate` vs. :term:`Non-dimension
coordinate` and :term:`Indexed coordinate` vs. :term:`Non-indexed
coordinate`. A coordinate named ``x`` can be retrieved from
``arr.coords[x]``. A ``DataArray`` can have more coordinates than
dimensions because a single dimension can be labeled by multiple
coordinate arrays. However, only one coordinate array can be a assigned
as a particular dimension's dimension coordinate array. As a
consequence, ``len(arr.dims) <= len(arr.coords)`` in general.

Dimension coordinate
A one-dimensional coordinate array assigned to ``arr`` with both a name
and dimension name in ``arr.dims``. Dimension coordinates are used for
label-based indexing and alignment, like the index found on a
:py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact,
dimension coordinates use :py:class:`pandas.Index` objects under the
hood for efficient computation. Dimension coordinates are marked by
``*`` when printing a ``DataArray`` or ``Dataset``.
and dimension name in ``arr.dims``. Usually (but not always), a
dimension coordinate is also an :term:`Indexed coordinate` so that it can
be used for label-based indexing and alignment, like the index found on
a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`.

Non-dimension coordinate
A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but
Expand All @@ -79,20 +78,40 @@ complete examples, please consult the relevant documentation.*
example, multidimensional coordinates are often used in geoscience
datasets when :doc:`the data's physical coordinates (such as latitude
and longitude) differ from their logical coordinates
<../examples/multidimensional-coords>`. However, non-dimension coordinates
are not indexed, and any operation on non-dimension coordinates that
leverages indexing will fail. Printing ``arr.coords`` will print all of
``arr``'s coordinate names, with the corresponding dimension(s) in
parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.
<../examples/multidimensional-coords>`. Printing ``arr.coords`` will
print all of ``arr``'s coordinate names, with the corresponding
dimension(s) in parentheses. For example, ``coord_name (dim_name) 1 2 3
...``.

Indexed coordinate
A coordinate which has an associated :term:`Index`. Generally this means
that the coordinate labels can be used for indexing (selection) and/or
alignment. An indexed coordinate may have one or more arbitrary
dimensions although in most cases it is also a :term:`Dimension
coordinate`. It may or may not be grouped with other indexed coordinates
depending on whether they share the same index. Indexed coordinates are
marked by ``*`` when printing a ``DataArray`` or ``Dataset``.

Non-indexed coordinate
A coordinate which has no associated :term:`Index`. It may still
represent fixed labels along one or more dimensions but it cannot be
used for label-based indexing and alignment.

Index
An *index* is a data structure optimized for efficient selecting and
slicing of an associated array. Xarray creates indexes for dimension
coordinates so that operations along dimensions are fast, while
non-dimension coordinates are not indexed. Under the hood, indexes are
implemented as :py:class:`pandas.Index` objects. The index associated
with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By
construction, ``len(arr.dims) == len(arr.indexes)``
An *index* is a data structure optimized for efficient data selection
and alignment within a discrete or continuous space that is defined by
coordinate labels (unless it is a functional index). By default, Xarray
creates a :py:class:`~xarray.indexes.PandasIndex` object (i.e., a
:py:class:`pandas.Index` wrapper) for each :term:`Dimension coordinate`.
For more advanced use cases (e.g., staggered or irregular grids,
geospatial indexes), Xarray also accepts any instance of a specialized
:py:class:`~xarray.indexes.Index` subclass that is associated to one or
more arbitrary coordinates. The index associated with the coordinate
``x`` can be retrieved by ``arr.xindexes[x]`` (or ``arr.indexes["x"]``
if the index is convertible to a :py:class:`pandas.Index` object). If
two coordinates ``x`` and ``y`` share the same index,
``arr.xindexes[x]`` and ``arr.xindexes[y]`` both return the same
:py:class:`~xarray.indexes.Index` object.

name
The names of dimensions, coordinates, DataArray objects and data
Expand Down
14 changes: 14 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,20 @@ v2023.07.1 (unreleased)
New Features
~~~~~~~~~~~~

- :py:class:`Coordinates` can now be constructed independently of any Dataset or
DataArray (it is also returned by the :py:attr:`Dataset.coords` and
:py:attr:`DataArray.coords` properties). ``Coordinates`` objects are useful for
passing both coordinate variables and indexes to new Dataset / DataArray objects,
e.g., via their constructor or via :py:meth:`Dataset.assign_coords`. We may also
wrap coordinate variables in a ``Coordinates`` object in order to skip
the automatic creation of (pandas) indexes for dimension coordinates.
The :py:class:`Coordinates.from_pandas_multiindex` constructor may be used to
create coordinates directly from a :py:class:`pandas.MultiIndex` object (it is
preferred over passing it directly as coordinate data, which may be deprecated soon).
Like Dataset and DataArray objects, ``Coordinates`` objects may now be used in
:py:func:`align` and :py:func:`merge`.
(:issue:`6392`, :pull:`7368`).
By `Benoît Bovy <https://github.com/benbovy>`_.
- Visually group together coordinates with the same indexes in the index section of the text repr (:pull:`7225`).
By `Justus Magin <https://github.com/keewis>`_.
- Allow creating Xarray objects where a multidimensional variable shares its name
Expand Down
4 changes: 3 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
where,
)
from xarray.core.concat import concat
from xarray.core.coordinates import Coordinates
dcherian marked this conversation as resolved.
Show resolved Hide resolved
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset
from xarray.core.extensions import (
Expand All @@ -37,7 +38,7 @@
from xarray.core.merge import Context, MergeError, merge
from xarray.core.options import get_options, set_options
from xarray.core.parallel import map_blocks
from xarray.core.variable import Coordinate, IndexVariable, Variable, as_variable
from xarray.core.variable import IndexVariable, Variable, as_variable
from xarray.util.print_versions import show_versions

try:
Expand Down Expand Up @@ -100,6 +101,7 @@
"CFTimeIndex",
"Context",
"Coordinate",
"Coordinates",
"DataArray",
"Dataset",
"Index",
Expand Down
Loading