Skip to content

MOP: reducing your cloud storage footprint

noblem edited this page Dec 13, 2018 · 3 revisions

Cloud computing can seem like a boon because of its scalability and ubiquity. But it also forces one to think much more about cost, particularly for compute and storage, because cloud providers are only happy to charge you more to run slow algorithms that consume bloated amounts of disk space. And because storage and compute are hidden within workspaces and methods in FireCloud, it is easy to fall prey to a "run it and forget" mentality once you have your results--meanwhile the bills pile up month after month for the storage consumed.

To counter this by raising awareness of what storage a workspace consumes, and parsimoniously clean up orphaned results--i.e. anything that is not attached to a data model attribute--FISS offers the mop command. If you have a method that produces a lot of intermediate files, especially if they are as large as BAMS, the mop command may be very helpful to reduce the storage charges for unwanted data in your workspace.

The current synopsis of the mop command is given below. We strongly recommend that you first invoke it in dry-run mode, to see what may get cleaned up. If it turns out that you don't want to delete everything, then you can edit the list and feed it to a command like gsutil rm (for example), to manually prune only the desired files.

usage: fissfc [OPTIONS] mop [-h] -w WORKSPACE [-p PROJECT] [--dry-run]

Remove unused files from a workspace's bucket

optional arguments:
  -h, --help            show this help message and exit
  -w WORKSPACE, --workspace WORKSPACE
                        Workspace name (required if no default workspace
                        configured)
  -p PROJECT, --project PROJECT
                        Project (workspace namespace). Required if no default
                        project was configured
  --dry-run             Show deletions that would be performed
Clone this wiki locally