Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taco locations #25

Open
charlesreid1 opened this issue Aug 24, 2018 · 3 comments
Open

taco locations #25

charlesreid1 opened this issue Aug 24, 2018 · 3 comments
Labels

Comments

@charlesreid1
Copy link
Member

We're thinking through where we are running taco, what workflows it submits, and where those workflows run.

Starting with the documentation's first page, installation, we give instructions for installing snakemake and singularity on the same node that will run taco. This seems to limit us to a single-node model. What if we are submitting jobs to clusters?

We have to think about taco as just a thin wrapper around snakemake, so whatever model we're currently using for snakemake, we use for taco. The answer to the question "where does taco run?" is the same as the answer to the question "where does snakemake run?"

Therefore we want to use the following abstraction:

  • taco runs on a "master node" that may or may not run the actual compute tasks
  • the master node runs the snakemake workflow
  • utilize the power of snakemake workflows to specify where the workflows run; don't reinvent the wheel by doing this with taco too
  • we want to be able to run workflows locally or remotely - either way
  • target custer platform: AWS, with some possibility of extending to HPC
  • dahak-bespin may help here: can set up a ready-to-go cloud cluster + VPC, and then snakemake will have those nodes available to it to farm out its tasks

We also have to think about #8 (using a cloud/URL model for getting the taco rules/workflow instructions) - which is intended to remove the need for the user to have a local copy of the workflow they want to run.

@charlesreid1
Copy link
Member Author

Proposal for work moving forward:

  • In the documentation, we should separate ((the installation of taco)) from ((the installation of things needed to run the workflows that taco wraps)) - think about taco as a tool that is assumed to run separately from the compute node

  • We need to develop a cloud URL model for rules - use the taco-simple workflow for that since it does not require any extras and can easily run taco and the compute task on the same node

  • We need to make taco cluster-capable - the assembly/read filtering workflows are a good test for submitting jobs to clusters using taco

@charlesreid1
Copy link
Member Author

Conceptual point of clarification: taco is not intended to be the end-all, be-all workflow runner tool. It relies heavily on Snakemake's functionality. It is intended to do exactly what dahak-metagenomics/dahak does, which is run workflows with user-provided parameters, but with a cleaner, simpler user interface. (and hopefully simpler snakemake rules on the back end.)

@charlesreid1
Copy link
Member Author

Tying in #6 (overlay model: workflows + metadata) here, since it seems relevant. This is an increase in scope to thinking about making workflows "importable".

This is related to the cloud/URL model for rules, but it would shift the focus away from ((user translates their workflows into rules for taco)) and toward ((taco does all the hard work of importing the already-written workflow))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant