-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quilt: Reproducible Data Dependencies for Python #162
Comments
I've chatted a bit with one of the Quilt founders. I suspect they would be happy to add netCDF/xarray support to the open source client if there's demand for it (especially if they get a pull request!). |
I'm a developer on Quilt. Happy to collaborate. Unstructured and semistructured data are also supported by Quilt (e.g. large image corpuses, geojson, etc.). In next week's
We're also working on community-powered specs for reproducible data. If you'd like to be included in the discussion let me know. |
@akarve Can we already build netcdf-based packages in Quilt? If so, can you point to an example or documentation on how to do so? |
In theory, yes. You can put any bits in and then use the In practice, if you provide me with an example of data-roundtrip that you'd like to accomplish with Quilt using netcdf data, then I can try that for you and see how it might be improved. |
To give you a concrete example, suppose
|
We would typically do this with xarray, e.g., ds = xarray.open_dataset(path) # netCDF file -> xarray.Dataset
ds.to_netcdf(path) # xarray.Dataset -> netCDF file |
Thanks. I can confirm that it is possible to round-trip xarray Datasets to and from a Quilt package. Here's a notebook. It takes more lines of code than I'd like to complete this round trip, but future versions of Quilt will get this close to two lines of code. |
Intake now officially being circulated: https://www.anaconda.com/blog/developer-blog/intake-taking-the-pain-out-of-data-access/ |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
I just stumbled across this post on the jupyter blog
https://blog.jupyter.org/reproducible-data-dependencies-for-python-guest-post-d0f68293a99
The quilt project seems to be aimed at solving many of the problems related to data discovery we have been discussing:
https://quiltdata.com/
It's a commercial product, but they have open sourced the building blocks.
They seem focused on tabular-style data. But nevertheless, it's probably worth looking into this.
The text was updated successfully, but these errors were encountered: