taco locations #25

charlesreid1 · 2018-08-24T18:38:50Z

We're thinking through where we are running taco, what workflows it submits, and where those workflows run.

Starting with the documentation's first page, installation, we give instructions for installing snakemake and singularity on the same node that will run taco. This seems to limit us to a single-node model. What if we are submitting jobs to clusters?

We have to think about taco as just a thin wrapper around snakemake, so whatever model we're currently using for snakemake, we use for taco. The answer to the question "where does taco run?" is the same as the answer to the question "where does snakemake run?"

Therefore we want to use the following abstraction:

taco runs on a "master node" that may or may not run the actual compute tasks
the master node runs the snakemake workflow
utilize the power of snakemake workflows to specify where the workflows run; don't reinvent the wheel by doing this with taco too
we want to be able to run workflows locally or remotely - either way
target custer platform: AWS, with some possibility of extending to HPC
dahak-bespin may help here: can set up a ready-to-go cloud cluster + VPC, and then snakemake will have those nodes available to it to farm out its tasks

We also have to think about #8 (using a cloud/URL model for getting the taco rules/workflow instructions) - which is intended to remove the need for the user to have a local copy of the workflow they want to run.

charlesreid1 · 2018-08-24T18:55:34Z

Proposal for work moving forward:

In the documentation, we should separate ((the installation of taco)) from ((the installation of things needed to run the workflows that taco wraps)) - think about taco as a tool that is assumed to run separately from the compute node
We need to develop a cloud URL model for rules - use the taco-simple workflow for that since it does not require any extras and can easily run taco and the compute task on the same node
We need to make taco cluster-capable - the assembly/read filtering workflows are a good test for submitting jobs to clusters using taco

charlesreid1 · 2018-08-24T19:03:57Z

Conceptual point of clarification: taco is not intended to be the end-all, be-all workflow runner tool. It relies heavily on Snakemake's functionality. It is intended to do exactly what dahak-metagenomics/dahak does, which is run workflows with user-provided parameters, but with a cleaner, simpler user interface. (and hopefully simpler snakemake rules on the back end.)

charlesreid1 · 2018-08-24T19:16:19Z

Tying in #6 (overlay model: workflows + metadata) here, since it seems relevant. This is an increase in scope to thinking about making workflows "importable".

This is related to the cloud/URL model for rules, but it would shift the focus away from ((user translates their workflows into rules for taco)) and toward ((taco does all the hard work of importing the already-written workflow))

charlesreid1 mentioned this issue Aug 24, 2018

Adjust requirements and setup.py required packages #15

Open

charlesreid1 added the aws label Aug 24, 2018

charlesreid1 mentioned this issue Aug 24, 2018

What happens if Snakemake has conda directives but no conda installed? #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

taco locations #25

taco locations #25

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018

taco locations #25

taco locations #25

Comments

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018

charlesreid1 commented Aug 24, 2018