-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add support for DataTree to xarray.merge() #10790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to this! Should the path of the node be added to the error message if the merge fails?
tree1 = xr.DataTree.from_dict({"/a/b": 1})
tree2 = xr.DataTree.from_dict({"/a/b": 2})
xr.merge([tree1, tree2])
e.g. using
xarray/xarray/core/datatree_mapping.py
Line 141 in 2b947e9
def _handle_errors_with_path_context(path: str): |
Great suggestion, done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! With this we can replace our custom datatree-merge
function (which uses map_over_datasets
) - I just checked and our tests still pass.
Thanks for checking! Let me know if you have any suggestions for further test coverage. |
xarray/structure/merge.py
Outdated
def depth(kv): | ||
return kv[0].count("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could use
xarray/xarray/core/treenode.py
Line 473 in ad51404
def depth(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree.level
turns out to be the right property. depth
refers (somewhat confusing, IMO) to the number of levels below the given node.
# Merge datasets, including inherited indexes to ensure alignment. | ||
datasets = [node.dataset for node in nodes] | ||
with add_path_context_to_errors(key): | ||
merge_result = merge_core( | ||
datasets, | ||
compat=compat, | ||
join=join, | ||
combine_attrs=combine_attrs, | ||
) | ||
# Remove inherited coordinates/indexes/dimensions. | ||
for var_name in list(merge_result.coord_names): | ||
if not any(var_name in node._coord_variables for node in nodes): | ||
del merge_result.variables[var_name] | ||
merge_result.coord_names.remove(var_name) | ||
for index_name in list(merge_result.indexes): | ||
if not any(index_name in node._node_indexes for node in nodes): | ||
del merge_result.indexes[index_name] | ||
for dim in list(merge_result.dims): | ||
if not any(dim in node._node_dims for node in nodes): | ||
del merge_result.dims[dim] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain / add comments explaining why this can't be done by just using node.to_dataset(inherit=False)
then merging the resulting datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a comment about this above already: Merge datasets, including inherited indexes to ensure alignment
whats-new.rst