Skip to content

Conversation

@headtr1ck
Copy link
Collaborator

@headtr1ck headtr1ck commented Oct 24, 2022

I have passed along some more memo dicts, which could prevent some double deep-copying of the same data (don't know how exactly, but who knows :P)
Also, I have found some copy calls that did not pass along the deep argument (I am not sure if that breaks things, lets find out).
And finally I have found some places where shallow copies are enough.

All together it should improve the performance a lot when copying things around.

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 25, 2022
@dcherian
Copy link
Contributor

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

@headtr1ck
Copy link
Collaborator Author

Thanks @headtr1ck do we have a benchmark for this, if not can we add one please?

Since the benchmark didn't change we either don't have one or my change doesn't matter much, haha.

I think the most important change is the shallow copy for Variable.to_index_variable, but I will have to check how to test this in a useful way, probably better testing some functions that use it indirectly.

@dschwoerer
Copy link
Contributor

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

@dcherian dcherian requested a review from benbovy November 2, 2022 14:51
@headtr1ck
Copy link
Collaborator Author

The change does matter - but deep copies are still much more expensive than they used to be (as to be expected, I guess)

Do you by any chance know which parts have improved, so we can add them as a benchmark here?

@headtr1ck
Copy link
Collaborator Author

headtr1ck commented Nov 5, 2022

I added a benchmark for swap_dims which uses Variable.to_index_variable when swapping into an existing coord, but I can not see any significant improvement...

Any ideas what else to test?
Maybe Indexes.copy_indexes, but I have not found a more high-level method that can take advantage of the memo dict...

@benbovy
Copy link
Member

benbovy commented Nov 7, 2022

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

to_index_variable may be called in some core functions of Xarray internals (e.g., in as_variable()) so it might be tricky to benchmark its effect Xarray-wise. Perhaps it would be good to track it down in the original issue #7181?

@headtr1ck
Copy link
Collaborator Author

Are we merging this anyway, or should we try harder to find a benchmark that shows some improvement?

@dcherian
Copy link
Contributor

I don't think we need a benchmark to merge. Sorry that wasn't clear, it mostly for information purposes.

The change in Variable.to_index_variable seems sensible (not sure when one wants a deep copy of an IndexVariable or an Xarray / Pandas index).

Great! let's merge

@headtr1ck headtr1ck added the plan to merge Final call for comments label Nov 29, 2022
@dcherian dcherian merged commit 0aee4fe into pydata:main Nov 30, 2022
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 2, 2022
* upstream/main: (39 commits)
  Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981)
  Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253)
  Fix mypy failures (pydata#7343)
  Docs: add example of writing and reading groups to netcdf (pydata#7338)
  Reset file pointer to 0 when reading file stream (pydata#7304)
  Enable mypy warn unused ignores (pydata#7335)
  Optimize some copying (pydata#7209)
  Add parse_dims func (pydata#7051)
  Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229)
  Remove code used to support h5py<2.10.0 (pydata#7334)
  [pre-commit.ci] pre-commit autoupdate (pydata#7330)
  Fix PR number in what’s new (pydata#7331)
  Enable `origin` and `offset` arguments in `resample` (pydata#7284)
  fix doctests: supress urllib3 warning (pydata#7326)
  fix flake8 config (pydata#7321)
  implement Zarr v3 spec support (pydata#6475)
  Fix polyval overloads (pydata#7315)
  deprecate pynio backend (pydata#7301)
  mypy - Remove some ignored packages and modules (pydata#7319)
  Switch to T_DataArray in .coords (pydata#7285)
  ...
@headtr1ck headtr1ck deleted the copy branch December 8, 2022 20:09
@benbovy benbovy mentioned this pull request Feb 10, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-indexing topic-performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xarray 2022.10.0 much slower then 2022.6.0

6 participants