Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to write BIDS directly to cloud storage #35

Merged
merged 26 commits into from
Jul 21, 2024
Merged

Conversation

akhanf
Copy link
Member

@akhanf akhanf commented May 2, 2024

This adds a write_to_remote option that if enabled, writes the output directly to cloud storage (s3 or gcs)

It uses a touch file as the output instead of the zarr folder, with the uri passed separately as a param. This is so that an FSStore can be instantiated in the snakemake rule, instead of generating locally then copying.

Had to add a final() function around the outputs that are optionally on remote, and this function adds the remote prefix, along with applying the storage() function.

To do:

  • add support for s3 as well (currently only gcs)
  • only the tif_stacks_to_ome_zarr script is updated, will need to also update the zarr_to_ome_zarr script similarly

akhanf added 7 commits April 29, 2024 02:11
- writing to cloud without writing to local first
- e.g. with fsspec zarr
  - couple things:
     - have to use no container to get gcloud authentication working
     - can't get stay_on_remote to do what I think it is meant to do (ie
allow the rule to write the file, not using a local file that is copied
by snakemake
     - if so, then might need to use a flag file as the local output
instead..
     - also, having some issues over-writing files - may need explicit
deletion
     - otherwise seems to work and writes the zarr to the cloud..
move this to an env var or find a way to access the CLI-based arg from within the
workflow
- adds credentials file as rule input
use of the storage keyword gives an error if storage appears on it's
own line -- this was happening with a long list comprehension + snakefmt, so I
changed to a loop to avoid this bug..
@akhanf akhanf changed the title Add option to write BIDS directly to cloud storage (gcs) Add option to write BIDS directly to cloud storage May 6, 2024
@akhanf akhanf marked this pull request as ready for review July 17, 2024 14:43
@akhanf akhanf requested review from Karl5766 and BenGros July 17, 2024 14:44
@akhanf akhanf added the enhancement New feature or request label Jul 17, 2024
accidentally committed downsampling to 8 (when quick-testing)..
uri = snakemake.params.uri

if uri.startswith('gcs://'):
uri = uri[6:]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking back at this, there is probably a more elegant way of doing this, I think I did it in a hurry..

akhanf added 6 commits July 17, 2024 17:03
- instead of a final() wrapper, the bids() function is overloaded to
append storage() when the file has a remote URI in it
- this way, we can just add the gcs:// or s3:// prefix to the root
(output folder) config variable, and avoid having a write_to_remote
flag.
- however, it complicates a couple other things, e.g. expand() cannot be
applied to files with the storage tag, so we make another wrapper,
expand_bids() to make sure storage() is applied after expanding..
- also refactored the fsspec code, which now lives in
workflow/lib/cloud_io.py - I considered moving it to zarrnii, but it is
actually snakemake specific so probably better to stay as a helper
function in the snakemake workflow
@akhanf
Copy link
Member Author

akhanf commented Jul 18, 2024

One more thing for me to do: update the QC scripts to use the params.uri if it is remote

@akhanf akhanf merged commit 15a01fc into main Jul 21, 2024
4 checks passed
@akhanf akhanf deleted the cloudoutput branch July 21, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant