Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subsurface floats dataset #321

Merged
merged 18 commits into from
Nov 15, 2023

Conversation

philippemiron
Copy link
Contributor

@philippemiron philippemiron commented Nov 10, 2023

resolves #320

  • quick implementation
  • clean up variable names
  • create variables: rowsize, ids
  • variable attributes (add coordinates)
  • docstring
  • tests to at least validate that the url is valid (to identify update)
  • wrap this in datasets
  • fix output to netcdf
  • references (?)

Comments?

@milancurcic
Copy link
Member

Comments? Awesome. :)

@philippemiron
Copy link
Contributor Author

@milancurcic, I don't remember exactly how this works, but do we always create a wrapper in the datasets module that just calls the to_xarray() function?

@philippemiron
Copy link
Contributor Author

Someone asked me for the subsurface floats dataset recently, so I decided that instead of updating my code, I would do it here. I'm thinking of adding Andro (#319) or YoMaHa after this, based on Argo floats.

@milancurcic
Copy link
Member

Adapter is needed whenever a dataset needs to be re-organized to be a CloudDrift conformable ragged-array dataset (obs and traj dims, rowsize variable; can have different names but they must be defined). For example, adapter is not needed for the upcoming Spotter open dataset because we coordinated with them to ensure it's a CloudDrift conformable ragged array.

Dataset function is merely a convenience accessor to an existing ragged-array dataset, so that the user doesn't need to find and manage data URLs. For example, in case of GDP, we simply xr.open_dataset(url) and return ds. In case of MOSAiC, the dataset is relatively small so we invoke the adapter under the hood.

So, I expect that every clouddrift.adapter.* will have a corresponding clouddrift.datasets.*, but not all clouddrift.datasets.* need to have an adapter.

From the end-user (as in consumer of data for analysis; not a contributor to the library) point of view, they should only ever have to access clouddrift.datasets.

@philippemiron
Copy link
Contributor Author

philippemiron commented Nov 11, 2023

Thanks, it makes sense.

@@ -0,0 +1,170 @@
"""
This module defines functions used to adapt the subsurface float trajectories as
a ragged-array dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny bit more info here about the dataset perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to subsurfacefloat.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this dataset have a proper name? It's not clear to me from its website. If not, I suggest subsurface_floats.

Copy link
Contributor Author

@philippemiron philippemiron Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was also my hesitation. I don't see any real defined name for it.. They write subsurface float trajectories, that's why I didn't put the s. I don't really mind either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some details. Let me know if you think this should be further improved.

clouddrift/adapters/subsurface.py Outdated Show resolved Hide resolved
@selipot
Copy link
Member

selipot commented Nov 14, 2023

Looks good but it does not seem like the dataset can be used with subset for variables with dimension exp. Should not this be possible?

@milancurcic
Copy link
Member

@philippemiron I will test this locally today. Can you also cast all float variables except time, lat, lon to float32?

@philippemiron
Copy link
Contributor Author

philippemiron commented Nov 15, 2023

@milancurcic Yes. However, doesn't this in the glad adapter converts all variables?

    # Cast double floats to singles
    for var in ds.variables:
        if ds[var].dtype == "float64":
            ds[var] = ds[var].astype("float32")

@milancurcic
Copy link
Member

Yes. I special-cased GLAD because all longitudes are in the Gulf (rather than global) and the position error included in the dataset exceeds the precision of the single-precision ulp.

Since subsurface floats are global (as I understand it), it's safest to keep lat, lon as float64.

@philippemiron
Copy link
Contributor Author

Ok good! I will add this for the 3 datasets I'm working on.

    # Cast double floats to singles
    double_vars = ["lat", "lon"]
    for var in [v for v in ds.variables if v not in double_vars]:
        if ds[var].dtype == "float64":
            ds[var] = ds[var].astype("float32")

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great; thank you. Will merge.

@milancurcic milancurcic merged commit 9a5f961 into Cloud-Drift:main Nov 15, 2023
12 checks passed
philippemiron added a commit to philippemiron/clouddrift that referenced this pull request Nov 16, 2023
* subsurface floats dataset

* more update

* add rowsize/ids

* remove ids

* more fixes

* small lint

* added to the doc

* we can deal with conflict later

* update

* lint

* a bit more details

* remove exp dimension

* docstring edits

* lint

* remove numbers to have simpler docstring

* double to single for data

---------

Co-authored-by: Philippe Miron <philippe.miron@dtn.com>
Co-authored-by: selipot <selipot@miami.edu>
Co-authored-by: milancurcic <caomaco@gmail.com>
@philippemiron philippemiron deleted the subsurface-floats branch December 16, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subsurface float trajectories dataset adapters
3 participants