Better snakemake cluster submission support #129

AroneyS · 2023-08-29T00:22:04Z

Add rule to localrules if simple, otherwise add resources with estimated cpu/mem/time.

Not sure about all the long-reads assemblies resources.

Add multiplier for resources with retry (if submitted job fails)
Add proper arguments to support snakemake args
- e.g. --profile qsub --retries 3, where profile points to ~/.config/snakemake/qsub/config.yaml which contains cluster, cluster-status, jobs, cluster-cancel and other snakemake arguments
Add logs for each rule (stdout/stderr captured by cluster software)

better snakemake cluster submission

wwood · 2023-08-29T02:56:20Z

localrules - how did I miss that before. Great.

On Aug 29 2023, at 12:14 pm, Rhys Newell ***@***.***> wrote: @rhysnewell approved this pull request. — Reply to this email directly, view it on GitHub (#129 (review)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAADX5A4QGOLQVYITY76KIDXXVGCHANCNFSM6AAAAAA4CEQWC4). You are receiving this because your review was requested.

stdout/err from cluster jobs isn't captured in main log

they seem to get many bins, so worth trying

More commits added, needs re-review

rhysnewell · 2023-08-31T01:46:56Z

Let me know when you're ready for this to be reviewed

run all other independent jobs when job errors

AroneyS · 2023-08-31T04:37:58Z

@rhysnewell @wwood Found a bug in snakemake where it solves the dag and then exits with no error. Only happened with >1 binner with cluster-submission. After much debugging, it turns out that its because the refinery rules don't have the group: keyword. So, since cluster submission tries to combine groups, it couldn't find anything to submit.

Anyway, solution is to either add the refinery rules to group: 'binning', or remove groups entirely. I'm leaning towards the latter (or to replace them with specific grouped rules) to make the submitted jobs smaller. Was there a reason that the groups were added in the first place? I see they are split into binning/assembly/annotation/isolate...

wwood · 2023-08-31T06:11:04Z

Who wrote them in according to git blame? I thought they were only for cluster/cloud submission, so probably best to delete, I think, but maybe there's a reason I don't appreciate.

AroneyS · 2023-09-01T02:41:27Z

@rhysnewell @wwood Ran integration tests: test_short_read_recovery, test_long_read_recovery and test_short_read_recovery_queue_submission all pass.

The only issue I see now is that the refinery logs (from rosella_refinery.py) are all empty. Not sure why, since I used the same logging method as run_checkm.py, etc., and their logs are populated.

rhysnewell · 2023-09-03T22:28:30Z

I just had a quick look, and stdout is being written to the log file for both the run_checkm and refine_rosella rules. All logging info for rosella comes from stderr, so you'd need to pipe stderr to the log file. Stdout should generally remain empty for a rosella run. I could be incorrect though, just from a glance

AroneyS · 2023-09-03T22:38:51Z

The stdout and stderr should both be going to log files since stderr is redirected to STDOUT. I did that since some of the tools aren't well-behaved...

subprocess.run(
    f"checkm2 predict -i {bin_folder}/ -x {bin_ext} -o {output_folder} -t {threads} --force".split(),
    env=os.environ,
    stdout=logf,
    stderr=subprocess.STDOUT
    )

rhysnewell · 2023-09-03T22:41:48Z

Ah gotcha, hmm. I guess you'd have to check firstly that the log output isn't just being piped somewhere, and secondly that the refinement is even running? Like it might be getting skipped in your tests due to lack of bins or something

AroneyS · 2023-09-03T22:44:17Z

Hah, yeh of course. The input CheckM files are empty. I'll log that refinement is being skipped.

AroneyS · 2023-09-04T01:12:44Z

@rhysnewell @wwood I think its ready. Good luck reading +1191/-676 lines, at least most of them are just removing groups and adding resources/logs.

wwood

Hi Sam, thanks for all this work. I didn't have time to go through each and every line, but the ones I looked at were good.

Happy to see there's further testing going on too.

I've only really requested a bit more documentation, I think a short example on using the cluster profile would work well enough.

Beyond this, I think it would be worth gathering a list of advantages of Aviary over other workflows, and listing those on the README. Can you please make a start on that, maybe in another PR?

Ta

wwood · 2023-09-05T02:29:48Z

aviary/aviary.py

+    base_group.add_argument(
+        '--snakemake-profile',
+        help='Snakemake profile (see https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles)\n'
+             'Create profile as `~/.config/snakemake/[CLUSTER_PROFILE]/config.yaml`. \n'


Unclear whether you mean the user or aviary does the creating. I think an example yml in the doco would go a long way here.

I added some stuff under Advanced Usage. Is that what you wanted?

wwood · 2023-09-05T03:38:28Z

Sounds good, but maybe add a subtitle rather than just advanced usage.

Also, is this still true? From that same doc. Would be worth adding a few words to say that it is only sorta true e.g. reruns in a cluster mode are requeued asking for increasingly more RAM

When performing assembly, users are required to estimate how much RAM they will need to use via -m, --max-memory, --max_memory

AroneyS · 2023-09-05T04:33:59Z

@wwood
The memory arg is used as a cap on job submission memory. So the job memory will increase from a set amount with each retry but stop at the cap.

add resources or localrules for each rule

22fb1e6

better snakemake cluster submission

AroneyS changed the base branch from main to dev August 29, 2023 00:22

AroneyS requested review from rhysnewell and wwood August 29, 2023 00:22

rhysnewell previously approved these changes Aug 29, 2023

View reviewed changes

AroneyS added 18 commits August 30, 2023 13:43

add multiplier for resources with retry

925260c

add threads to match mem_mb

fc9f55c

run test_integration tests in permanent dir

4e95ad1

add logs to each rule

7dd653e

stdout/err from cluster jobs isn't captured in main log

reduce refinery times

62c0e19

add rerun triggers to run_workflow

8ef6bd1

add rerun_triggers to run_workflow in main

704e652

fix rerun_triggers default

d7d6756

add --snakemake-profile and --cluster-retries arguments

69fec3b

fix logs for "script" rules

a0a3d72

add \n to logf.write cmds

076c2fd

missed some

6b0e5a2

fix profile check when running snakemake

2d88935

account for occasional long-term semibin2 runs

d4ba3f4

they seem to get many bins, so worth trying

fix tests

e240d0a

fix programs which log to stdout

80d4efe

improve portability

ac61051

echo CheckM2 database to log

eb18a06

rhysnewell self-requested a review August 31, 2023 01:45

add keep-going to snakemake cmd

08c8758

run all other independent jobs when job errors

AroneyS added 11 commits September 1, 2023 07:45

add resources to qc.smk

58817fb

test queue submission across assembly+recovery

242667a

fix test

ab5747e

consolidate integration tests

5123ade

fix test name

0cb1c9f

remove tmpdir from integration tests

8f35643

add log files to qc rules

919a96c

remove extra quotes

a1ad666

add log files to assembly

233f219

fix some logging errors

180c1ac

fix another print

fcbf579

AroneyS added 3 commits September 4, 2023 09:55

add refinery logging when skipping

823e0b6

move refinery logging to log also for refine_dastool

252755f

re-add logging that otherwise missed

66c0127

wwood reviewed Sep 5, 2023

View reviewed changes

rhysnewell approved these changes Sep 5, 2023

View reviewed changes

add example snakemake cluster to docs

c11b501

AroneyS added 2 commits September 5, 2023 13:41

rename cluster submission section

99e6bec

add memory increase/cap information to RAM control section

0a2d7e0

AroneyS merged commit de1a48a into dev Sep 7, 2023

AroneyS deleted the add-resources branch September 7, 2023 04:07

AroneyS mentioned this pull request Sep 13, 2023

bump to v0.8.0 #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better snakemake cluster submission support #129

Better snakemake cluster submission support #129

AroneyS commented Aug 29, 2023 •

edited

Loading

wwood commented Aug 29, 2023 via email

rhysnewell commented Aug 31, 2023

AroneyS commented Aug 31, 2023

wwood commented Aug 31, 2023

AroneyS commented Sep 1, 2023

rhysnewell commented Sep 3, 2023

AroneyS commented Sep 3, 2023

rhysnewell commented Sep 3, 2023

AroneyS commented Sep 3, 2023

AroneyS commented Sep 4, 2023

wwood left a comment

wwood Sep 5, 2023

AroneyS Sep 5, 2023

wwood commented Sep 5, 2023

AroneyS commented Sep 5, 2023 •

edited

Loading

Better snakemake cluster submission support #129

Better snakemake cluster submission support #129

Conversation

AroneyS commented Aug 29, 2023 • edited Loading

wwood commented Aug 29, 2023 via email

rhysnewell commented Aug 31, 2023

AroneyS commented Aug 31, 2023

wwood commented Aug 31, 2023

AroneyS commented Sep 1, 2023

rhysnewell commented Sep 3, 2023

AroneyS commented Sep 3, 2023

rhysnewell commented Sep 3, 2023

AroneyS commented Sep 3, 2023

AroneyS commented Sep 4, 2023

wwood left a comment

Choose a reason for hiding this comment

wwood Sep 5, 2023

Choose a reason for hiding this comment

AroneyS Sep 5, 2023

Choose a reason for hiding this comment

wwood commented Sep 5, 2023

AroneyS commented Sep 5, 2023 • edited Loading

AroneyS commented Aug 29, 2023 •

edited

Loading

AroneyS commented Sep 5, 2023 •

edited

Loading