Add script to run full pipeline locally #2799

kurtwheeler · 2021-06-24T15:34:33Z

Issue Number

Purpose/Implementation Notes

This adds a python script that invokes bash scripts to run surveyor, downloader, and processor jobs for a given experiment accession (or organism/Ensembl division in the case of transcriptome indices).

I also ended up doing some refactoring because the transcriptome jobs were failing because of an issue in needs_processing. Fixing that issue ended up creating a cyclical import issue. The right way to fix it seemed to split the job_lookup namespace because it was really doing two things.

Types of changes

New feature (non-breaking change which adds functionality)

Functional tests

I've run transcriptome and no_op smoothly and I'm running others still.

Checklist

Lint and unit tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in downstream modules

… an issue though.

…ptome jobs run.

wvauclain · 2021-06-24T16:14:20Z

scripts/reinit_database.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+
+# Reintializes the database so there's no data or migrations run against it.


Fancy, I like it

wvauclain

Few small things

wvauclain · 2021-06-24T16:21:45Z

foreman/data_refinery_foreman/foreman/management/commands/get_job_to_be_run.py

+        If there are no unstarted ProcessorJobs, then the most
+        recently created unstarted DownloaderJob will be prioritized.
+
+        The dict will have two top level keys: downloader_jobs and processor_jobs.


I think this line is outdated

wvauclain · 2021-06-24T16:22:15Z

scripts/run_full_pipeline.py

+
+
+def parse_args():
+    description = """This script can be used to run the full pipeline.


wvauclain · 2021-06-24T16:28:28Z

scripts/run_full_pipeline.py

+        elif job_name == "NO_OP":
+            image_name = "no_op"
+
+    completed_command = subprocess.check_call(


From here, it looks like check_call throws if the return code is nonzero, rather than returning the nonzero error code.

Also, does the output of this get sent to the terminal? That would be preferable, or at least somewhere that the output can be inspected.

Oh good catch. I guess I don't gotta check the return codes.

The output does get sent to the terminal. It's a little bit chatty at the moment because the run_job.sh and run_surveyor.sh scripts rebuild the docker images every time but it is nice having them take care of that for you as you iterate.

I think that maybe in the future we could have the pipeline script look at what docker images would need to be built and build them once before kicking off work, but that's a larger refactor to the way the scripts work. In this PR I just wanted to get the flow working with the existing scripts and it already ended up being nontrivial.

wvauclain · 2021-06-24T16:29:19Z

scripts/run_full_pipeline.py

+            image_name,
+            subcommand,
+            f"--job-name={job_name}",
+            # job_name,


I think you can get rid of these commented-out elements

wvauclain · 2021-06-24T16:29:50Z

scripts/run_full_pipeline.py

+
+
+def survey_accession(accession_code):
+    completed_command = subprocess.check_call(


Same comment as above

wvauclain · 2021-06-24T16:30:43Z

scripts/run_full_pipeline.py

+
+    job_to_run = get_job_to_run()
+
+    while job_to_run:


You might want to make this while job_to_run is not None: and then explicitly return None in get_job_to_run() if we fall through all the if statements

There's only one if statement in get_job_to_run. The get_job_to_be_run management command has several but it has to return JSON, and None isn't valid JSON. Instead get_job_to_be_run returns a dict representing a processor job, a downloader job, or an empty dict. An empty dict {} is both valid JSON and falsey in python so it works here I think.

Is there any advantage to using an explicit None here?

Good point, I didn't think of that. It's good as is then.

kurtwheeler added 3 commits June 23, 2021 17:13

Mostly gets test local job runner working, transcriptome indices have…

72e8f8d

… an issue though.

Split the job_lookup namespace to avoid cyclical import. Let transcri…

3bb1324

…ptome jobs run.

Add enums.py that I forgot to add before.

05bedc6

kurtwheeler requested a review from wvauclain June 24, 2021 15:34

wvauclain reviewed Jun 24, 2021

View reviewed changes

wvauclain suggested changes Jun 24, 2021

View reviewed changes

Cleanup based on review.

c6d27cc

wvauclain approved these changes Jun 24, 2021

View reviewed changes

kurtwheeler merged commit 65b724c into dev Jun 25, 2021

kurtwheeler deleted the kurtwheeler/run-locally branch June 25, 2021 14:46

kurtwheeler mentioned this pull request Jun 25, 2021

Create mechanism to run pipelines locally #2775

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to run full pipeline locally #2799

Add script to run full pipeline locally #2799

kurtwheeler commented Jun 24, 2021

wvauclain Jun 24, 2021

wvauclain left a comment

wvauclain Jun 24, 2021

wvauclain Jun 24, 2021

wvauclain Jun 24, 2021

kurtwheeler Jun 24, 2021

wvauclain Jun 24, 2021

wvauclain Jun 24, 2021

wvauclain Jun 24, 2021

kurtwheeler Jun 24, 2021

wvauclain Jun 24, 2021

		@@ -0,0 +1,22 @@
		#!/bin/bash

		# Reintializes the database so there's no data or migrations run against it.



		def parse_args():
		description = """This script can be used to run the full pipeline.



		def survey_accession(accession_code):
		completed_command = subprocess.check_call(

Add script to run full pipeline locally #2799

Add script to run full pipeline locally #2799

Conversation

kurtwheeler commented Jun 24, 2021

Issue Number

Purpose/Implementation Notes

Types of changes

Functional tests

Checklist

Choose a reason for hiding this comment

wvauclain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment