generated from snakemake-workflows/snakemake-workflow-template
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to write BIDS directly to cloud storage #35
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- writing to cloud without writing to local first - e.g. with fsspec zarr - couple things: - have to use no container to get gcloud authentication working - can't get stay_on_remote to do what I think it is meant to do (ie allow the rule to write the file, not using a local file that is copied by snakemake - if so, then might need to use a flag file as the local output instead.. - also, having some issues over-writing files - may need explicit deletion - otherwise seems to work and writes the zarr to the cloud..
move this to an env var or find a way to access the CLI-based arg from within the workflow
- adds credentials file as rule input
use of the storage keyword gives an error if storage appears on it's own line -- this was happening with a long list comprehension + snakefmt, so I changed to a loop to avoid this bug..
akhanf
changed the title
Add option to write BIDS directly to cloud storage (gcs)
Add option to write BIDS directly to cloud storage
May 6, 2024
and add missing derivatives and json files as final
accidentally committed downsampling to 8 (when quick-testing)..
akhanf
commented
Jul 17, 2024
uri = snakemake.params.uri | ||
|
||
if uri.startswith('gcs://'): | ||
uri = uri[6:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking back at this, there is probably a more elegant way of doing this, I think I did it in a hurry..
- instead of a final() wrapper, the bids() function is overloaded to append storage() when the file has a remote URI in it - this way, we can just add the gcs:// or s3:// prefix to the root (output folder) config variable, and avoid having a write_to_remote flag. - however, it complicates a couple other things, e.g. expand() cannot be applied to files with the storage tag, so we make another wrapper, expand_bids() to make sure storage() is applied after expanding.. - also refactored the fsspec code, which now lives in workflow/lib/cloud_io.py - I considered moving it to zarrnii, but it is actually snakemake specific so probably better to stay as a helper function in the snakemake workflow
One more thing for me to do: update the QC scripts to use the params.uri if it is remote |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a
write_to_remote
option that if enabled, writes the output directly to cloud storage (s3 or gcs)It uses a touch file as the output instead of the zarr folder, with the uri passed separately as a param. This is so that an FSStore can be instantiated in the snakemake rule, instead of generating locally then copying.
Had to add a
final()
function around the outputs that are optionally on remote, and this function adds the remote prefix, along with applying the storage() function.To do: