Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for coord_names in separate_coords #191

Merged
merged 8 commits into from
Nov 5, 2024

Conversation

ayushnag
Copy link
Contributor

@ayushnag ayushnag commented Jul 15, 2024

@ayushnag
Copy link
Contributor Author

@TomNicholas this closes the issue however I think there is some existing functionality that can be refactored. My understanding is that the current logic is trying to find coordinates within attrs here. However this is accessing the dataset attributes whereas coordinates is within the attrs of each variable. When I debug the current code, coord_names is always empty. I can refactor the code and update this PR to remove the coord_names param from separate_coords and simply use the newly added search within separate_coords

@TomNicholas
Copy link
Member

TomNicholas commented Jul 21, 2024

I can refactor the code and update this PR to remove the coord_names param from separate_coords and simply use the newly added search within separate_coords

That would be great @ayushnag. I think the presence of the coord_names kwarg comes from a time when I didn't actually understand where I was supposed to get the information about which variable was a coordinate, so I had left it general. But it seems we don't need it, so let's get rid of it.

We should also do a few other things:

  • Strip the coordinates entry from the .zattrs (i.e. remove it from the xarray variable's .attrs) - this is redundant information once we have set the variable as an xarray coordinate.
  • Make sure the reverse process also works - saving a multi-dimensional coordinate to kerchunk json records the fact it should be a coordinate by re-adding an entry to the .zattrs.
  • Add a regression test, which should be another roundtripping test, which can use the xr.tutorial.open_dataset("ROMS_example.nc") you showed in open_virtual_dataset returns some coordinates as data variables #189.

@TomNicholas TomNicholas added bug Something isn't working CF conventions labels Jul 21, 2024
@TomNicholas
Copy link
Member

@ayushnag I would like to merge this important bugfix and issue a release of a new version of virtualizarr. Are you likely to have time to come back to this or should I merge this PR and open another?

@ayushnag
Copy link
Contributor Author

ayushnag commented Jul 30, 2024

@TomNicholas I was looking into this some more and it seems that the coord_names param is needed for non dimension coordinate cases (there is coordinate information in global dataset .zattrs) and we can't eliminate that function param. When I tried to remove it, the kerchunk roundtrip integration test failed

I unfortunately don't have time to implement your new suggestions but you can feel free to merge this or make an updated PR.

@TomNicholas TomNicholas added this to the v1.1.0 milestone Aug 1, 2024
TomNicholas added a commit that referenced this pull request Aug 23, 2024
@ayushnag
Copy link
Contributor Author

@TomNicholas I do have some time to add tests and get this merged. The current fix is in common.py however there is no test_common. Do you want me to make this file or just add the test to test_backend or test_utils? The test I am planning on adding is test_separate_coords() which will check the functionality of (1) and (2) from your comment here

@TomNicholas
Copy link
Member

That would be great! Test_backends would be fine I think.

@ayushnag
Copy link
Contributor Author

Great! Do you have an example netcdf that has some coordinates where CF decoding is required? It would be ideal to have a test that checks all three cases where two should pass after this PR and I can mark the third CF case as xfail with a TODO so it's a known issue/task for later. The problem with xr.tutorial.open_dataset("ROMS_example") is that it just has (1) and (2) so this PR actually finds all of the coordinates.

If there isn't one to easily download I can manually construct one, I just need to know the cases where CF decoding is needed

@TomNicholas
Copy link
Member

I think another one of the Xarray tutorial datasets (the roms one) might have this property. Does my half-finished PR to make (3) work include one?

@ayushnag
Copy link
Contributor Author

ayushnag commented Oct 30, 2024

I tested this change with ROMS_example and it does pass with just this PR. Your PR used the ROM_example as well

@TomNicholas
Copy link
Member

Ah damn. Well we can just leave this for another PR.

If you do want to go further down the rabbit hole I think that one example of CF metadata that would cause Xarray to set extra variables as coordinates is when one variable has a bounds attribute that contains the name of another variable.

@sharkinsspatial maybe you know this bit of CF lore?

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ayushnag !

@TomNicholas TomNicholas merged commit 3fa5cff into zarr-developers:main Nov 5, 2024
10 of 11 checks passed
TomNicholas added a commit that referenced this pull request Nov 7, 2024
TomNicholas added a commit that referenced this pull request Nov 7, 2024
* add new section to release notes for unreleased additions

* add release note for #191

* add release note for #266

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pre-commit

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CF conventions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

open_virtual_dataset returns some coordinates as data variables
2 participants