Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run snakemakefile-no results #116

Closed
LiZhihua1982 opened this issue Nov 30, 2022 · 8 comments
Closed

Run snakemakefile-no results #116

LiZhihua1982 opened this issue Nov 30, 2022 · 8 comments
Assignees
Labels
question Further information is requested

Comments

@LiZhihua1982
Copy link

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 28
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads


all 1 1 1
total 1 1 1

Select jobs to execute...

[Wed Nov 30 18:04:13 2022]
Job 0:
WARNING: Be very careful when adding/removing any lines above this message.
The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly,
therefore adding/removing any lines before this message will likely result in parser malfunction.

Reason: Rules with a run or shell declaration but no output are always executed.

    echo "Gathering /media/lizhihua/software/metaGEM/qfiltered/GJ1/GJ1_R1.fastq.gz /media/lizhihua/software/metaGEM/qfiltered/GJ2/GJ2_R1.fastq.gz ... "

[Wed Nov 30 18:04:14 2022]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2022-11-30T180413.551219.snakemake.log

@LiZhihua1982
Copy link
Author

(base) lizhihua@lizhihua-T640:/media/lizhihua/software/metaGEM$ snakemake -s Snakefile --use-conda --rerun-incomplete --cores 28 -p
/home/lizhihua/miniconda3/lib/python3.8/site-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (5.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 28
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads


all 1 1 1
total 1 1 1

Select jobs to execute...

[Wed Nov 30 18:27:36 2022]
Job 0:
WARNING: Be very careful when adding/removing any lines above this message.
The metaGEM.sh parser is presently hardcoded to edit line 22 of this Snakefile to expand target rules accordingly,
therefore adding/removing any lines before this message will likely result in parser malfunction.

Reason: Rules with a run or shell declaration but no output are always executed.

    echo "Gathering /media/lizhihua/software/metaGEM/qfiltered/GJ1/GJ1_R1.fastq.gz /media/lizhihua/software/metaGEM/qfiltered/GJ2/GJ2_R1.fastq.gz ... "

/home/lizhihua/miniconda3/lib/python3.8/site-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.25.11) or chardet (5.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Gathering /media/lizhihua/software/metaGEM/qfiltered/GJ1/GJ1_R1.fastq.gz /media/lizhihua/software/metaGEM/qfiltered/GJ2/GJ2_R1.fastq.gz ...
[Wed Nov 30 18:27:37 2022]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2022-11-30T182736.460571.snakemake.log

@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Nov 30, 2022

Hi Li,

Please make sure you ask a question in your issue, otherwise I am not sure how I can help.
Similarly, please include the commands that you are running and what you are trying to accomplish.

It looks like you have succesfully quality filtered your reads, now you will want to assemble them.

bash metaGEM.sh --task megahit

Running this command will re-configure your Snakefile so that rule all generates your desired assembly output.
Have a look at the tutorial and other resources on the repo for additional information.

Best wishes,
Francisco

@LiZhihua1982
Copy link
Author

LiZhihua1982 commented Nov 30, 2022 via email

@franciscozorrilla
Copy link
Owner

Hi Li, Good question 💎

You are correct that Snakemake can resolve the dependencies between multiple steps in one go, however it gets a bit tricky with the wildcards going from sample IDs to genome/bin IDs. In practice, I generally run only one or two rules at a time, and then check the outputs, since at many points in the pipeline it does not make sense to continue if some of your jobs/samples failed. Especially with larger datasets, you can end up wasting a lot of computational resources by submitting jobs that are running on corrputed/incomplete data.

So for example, if I had 10 raw samples, I may start with the following command:

bash metaGEM.sh --task megahit

This would submit quality filtering jobs first, and then the assembly jobs, since assembly requires the qfiltered reads.

In theory, it should be possible to request to submit jobs like this at any point of the workflow, however the Snakefile rule inputs and outputs need to be carefully matched.

For example, try running the following:

bash metaGEM.sh --task binRefine

In the future please open a new issue if it is unrelated to the current one.

Best,
Francisco

@LiZhihua1982
Copy link
Author

LiZhihua1982 commented Nov 30, 2022 via email

@franciscozorrilla
Copy link
Owner

Yes, if you already have assembled contigs then simply place them in the corresponding folders where metaGEM expects the assemblies to be generated. This has been answered in issue #56, in particular see this comment.

@franciscozorrilla franciscozorrilla self-assigned this Nov 30, 2022
@franciscozorrilla franciscozorrilla added the question Further information is requested label Nov 30, 2022
@LiZhihua1982
Copy link
Author

LiZhihua1982 commented Dec 2, 2022 via email

@franciscozorrilla
Copy link
Owner

franciscozorrilla commented Dec 2, 2022

This is more of a general Snakemake question rather than a metaGEM specific issues, please have a look at the Snakemake documentation to understand how rule dependencies are resolved.

If you ask the metaGEM Snakefile to produce refined bins (e.g. bash metaGEM.sh --task binRefine), then it will figure out what previous rules/tasks need to be submitted in order to produce the inputs needed for binRefine:

metaGEM/Snakefile

Lines 895 to 905 in 6285b93

rule binRefine:
input:
concoct = f'{config["path"]["root"]}/{config["folder"]["concoct"]}/{{IDs}}/{{IDs}}.concoct-bins',
metabat = f'{config["path"]["root"]}/{config["folder"]["metabat"]}/{{IDs}}/{{IDs}}.metabat-bins',
maxbin = f'{config["path"]["root"]}/{config["folder"]["maxbin"]}/{{IDs}}/{{IDs}}.maxbin-bins'
output:
directory(f'{config["path"]["root"]}/{config["folder"]["refined"]}/{{IDs}}')
benchmark:
f'{config["path"]["root"]}/{config["folder"]["benchmarks"]}/{{IDs}}.binRefine.benchmark.txt'
shell:
"""

As you can see, binRefine requires concoct, maxbin, and metabat output, so Snakemake will check if those files are present, if not then it will submit the binning jobs to produce those results, and then submit the binRefine jobs. However, Snakemake also has to check that for those binning tasks, the inputs are also present. For example have a look at the concoct rule:

metaGEM/Snakefile

Lines 627 to 630 in 6285b93

rule concoct:
input:
table = f'{config["path"]["root"]}/{config["folder"]["concoct"]}/{{IDs}}/cov/coverage_table.tsv',
contigs = rules.megahit.output

So if your concoct inputs (i.e. assembly and coverage table) are missing, then Snakemake will submit jobs to generate those as well, and so on to resolve file dependencies between rules/tasks.

If you run a command like this:

bash metaGEM.sh --task megahit -j 10 -h 1 -m 60 -c 28

Then the metaGEM.sh will update the cluster_config.json parameters with 10 one-hour-max-runtime jobs with 60GB RAM and 28 cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants