-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add load_dataset and save_dataset functions #392
Conversation
Codecov Report
@@ Coverage Diff @@
## master #392 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 31 32 +1
Lines 2230 2247 +17
=========================================
+ Hits 2230 2247 +17
Continue to review full report at Codecov.
|
""" | ||
store = str(path) | ||
for v in ds: | ||
ds[v].encoding.pop("chunks", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning pydata/xarray#4380 in a comment?
fcfa0c5
to
13559a9
Compare
This fixes #298, and also incorporates some of the implementation from https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/blob/4f862e31b8093d25fdaa8da7f841b9be8583cda4/scripts/gwas.py#L53-L71
Unlike #298 it does not rechunk the dataset before saving. I worry about having implicit rechunk methods that the user can't control since we have seen them perform poorly in some cases. For the moment I think it's preferable for the user to explicitly rechunk before saving. This is what I have been doing in the MalariaGEN notebooks, and Eric has too judging by this example.
I also haven't added
fsspec
support even though the GWAS pipeline uses it. This is because I was getting a warning that files were not being closed when running on local files. That could be addressed separately.