Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2067 - Migrate datatree io.py and common.py #9011

Merged
merged 11 commits into from
Jun 12, 2024

Conversation

owenlittlejohns
Copy link
Contributor

This PR migrates what I believe is the last actual source code out of the xarray/datatree_/datatree directory into xarray/core. This is another piece of #8572.

I made a couple of decisions that I'd love some feedback/extra eyes on:

  • I moved the remaining functions from xarray/datatree_/datatree/io.py into their own module, rather than xarray/backends/api.py. It felt like these were a little separate, but I can definitely be convinced otherwise.
  • I created a subclass for TreeAttrAccessMixin. I really just wanted to swap over to using AttrAccessMixin, but I ran into issues with __slots__. Looking at it, I think I would need to work on the DataTree, TreeNode and NamedNode classes. I can go down that path, but thought I'd put up this PR for discussion before beginning that effort.
  • Continues Track merging datatree into xarray #8572 (Contributed to last subitem of "Merge in code for DataTree class.")
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

xarray/core/types.py Show resolved Hide resolved
xarray/core/datatree_io.py Show resolved Hide resolved
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]:
return list(items)


class TreeAttrAccessMixin(AttrAccessMixin):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's the biggest question I have... is this okay for now? (Maybe this is directed at @TomNicholas)

The use of __slots__ is a little new to me (but makes sense from reading up on it). I think to adopt __slots__ and remove __dict__ I would need to slightly rework DataTree, TreeNode and NamedNode. I wanted to check that the scope mainly seems to be self.children and self.parent attributes, and understand why those classes were implemented as they currently are before trying to alter things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to embarrass me, but why did you stop calling init_subclass on the Parent object? everything else with this merge seems reasonable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I accidentally commented out that last line. Will fix!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my last comment was true, but... uncommenting out that line means that the same method in the parent class (AttrAccessMixin) gets called by super in addition to this method, right? That would mean we end up calling the validation logic that this child method is trying to avoid for now (checking that __dict__ is not present on the class).

Now if only I'd said that the first time around 😉

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label May 7, 2024
Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just comments for now

xarray/core/types.py Show resolved Hide resolved
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]:
return list(items)


class TreeAttrAccessMixin(AttrAccessMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to embarrass me, but why did you stop calling init_subclass on the Parent object? everything else with this merge seems reasonable to me.

xarray/core/datatree_io.py Show resolved Hide resolved
@@ -347,6 +347,34 @@ def _ipython_key_completions_(self) -> list[str]:
return list(items)


class TreeAttrAccessMixin(AttrAccessMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class necessary? Can't DataTree inherit directly from AttrAccessMixin? And override _attr_sources and so on?

I think the reason this class existed separately was just to do with hacking around limitations of datatree being in a separate repository...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried to use AttrAccessMixin. Just dropping it in as a replacement for TreeAttrAccessMixin doesn't work because of the checking done inside the AttrAccessMixin.__init_subclass__ method. The check for if __dict__ exists fails. DataTree, TreeNode and NamedNode do all define __slots__, but I think there are attributes being defined in places like DataTree.__init__, for example DataTree.parent and DataTree.children (versus the DataTree._parent and DataTree._children attributes defined in DataTree.__slots__).

I think we talked about this a little bit in one of our calls - things like DataTree.parent and DataTree.children seemed like tricky things to change over.

xarray/core/datatree_io.py Show resolved Hide resolved
@flamingbear
Copy link
Member

@TomNicholas @owenlittlejohns I think we decided at the Tuesday meeting we are leaving the slots/dict as is and this is ready for approvals. Which I am about to do.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has some comments, but most of it was discussed in the special Xarray/Datatree meeting and I think this can be merged.

@owenlittlejohns
Copy link
Contributor Author

@TomNicholas - just checking in over this PR. I captured the outstanding work to make DataTree work with AttrAccessMixin in #9068. With that in mind, are we good to merge here?

xarray/core/common.py Outdated Show resolved Hide resolved
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Merge when you like.

@owenlittlejohns owenlittlejohns merged commit cb3663d into pydata:main Jun 12, 2024
28 checks passed
@owenlittlejohns owenlittlejohns deleted the DAS-2067-io-and-common branch June 12, 2024 01:06
owenlittlejohns added a commit that referenced this pull request Jun 12, 2024
dcherian added a commit to dcherian/xarray that referenced this pull request Jun 13, 2024
* upstream/main:
  [skip-ci] Try fixing hypothesis CI trigger (pydata#9112)
  Undo custom padding-top. (pydata#9107)
  add remaining core-dev citations [skip-ci][skip-rtd] (pydata#9110)
  Add user survey announcement to docs (pydata#9101)
  skip the `pandas` datetime roundtrip test with `pandas=3.0` (pydata#9104)
  Adds Matt Savoie to CITATION.cff (pydata#9103)
  [skip-ci] Fix skip-ci for hypothesis (pydata#9102)
  open_datatree performance improvement on NetCDF, H5, and Zarr files (pydata#9014)
  Migrate datatree io.py and common.py into xarray/core (pydata#9011)
  Micro optimizations to improve indexing (pydata#9002)
  (fix): don't handle time-dtypes as extension arrays in `from_dataframe` (pydata#9042)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

5 participants