-
Notifications
You must be signed in to change notification settings - Fork 25
MOP: reducing your cloud storage footprint
Cloud computing can seem like a boon because of its scalability and ubiquity. But it also forces one to think much more about cost, particularly for compute and storage, because cloud providers are only happy to charge you more to run slow algorithms that consume bloated amounts of disk space. And because storage and compute are hidden within workspaces and methods in FireCloud, it is easy to fall prey to a "run it and forget" mentality once you have your results--meanwhile the bills pile up month after month for the storage consumed.
To counter this by raising awareness of what storage a workspace consumes, and parsimoniously clean up orphaned results--i.e. anything that is not attached to a data model attribute--FISS offers the mop
command. If you have a method that produces a lot of intermediate files, especially if they are as large as BAMS, the mop
command may be very helpful to reduce the storage charges for unwanted data in your workspace.
The current synopsis of the mop
command is given below. We strongly recommend that you first invoke it in dry-run
mode, to see what may get cleaned up. If it turns out that you don't want to delete everything, then you can edit the list and feed it to a command like gsutil rm
(for example), to manually prune only the desired files.
usage: fissfc [OPTIONS] mop [-h] -w WORKSPACE [-p PROJECT] [--dry-run]
Remove unused files from a workspace's bucket
optional arguments:
-h, --help show this help message and exit
-w WORKSPACE, --workspace WORKSPACE
Workspace name (required if no default workspace
configured)
-p PROJECT, --project PROJECT
Project (workspace namespace). Required if no default
project was configured
--dry-run Show deletions that would be performed