Skip to content
This repository has been archived by the owner on Mar 17, 2023. It is now read-only.

Second run of methylation analysis triggers alignment again #9

Closed
miguelpmachado opened this issue Jun 30, 2020 · 8 comments · Fixed by #14
Closed

Second run of methylation analysis triggers alignment again #9

miguelpmachado opened this issue Jun 30, 2020 · 8 comments · Fixed by #14

Comments

@miguelpmachado
Copy link
Contributor

Hi,
I'm testing Nanopype and it seems to be a very nice tool.
However, I'm facing an unexpected behaviour when running twice the methylation analysis: when I run it for the first time and ask a bedGraph, it uses the existing alignment and proceeds from from the nanopolish step, however, if I run it a second time and ask for a bw file, it reruns the alignment step and everything from there.
I newbie in Snakemake, but if I understood it right, snakemake only rerun things if the modification time of input files are more recent than the one of output files, or if an intermediate file is required and are no longer available. I tried to understand if any input file was changed during the methylation analysis, but I couldn't find anything changing it (from my inexperience point of view). I looked if there were any output file flagged as temporary and there were (.fofn), but those are produced after the nanopolish step, which should not have triggered the mapping step.
What do you thing that could be causing this behaviour?
Thanks in advance.
Miguel

P.s.1: I attached the dag files of both runs.
P.s.2: I've made a few changes and added a few things to the code (that I'm still testing locally) that improves the flexibility and cluster usage. Can I do a pull request? Thanks again.

methylation_first_run
methylation_second_run

@giesselmann
Copy link
Owner

Hi Miguel,

you're right, re-running for the .bw file should never trigger alignment and methylation detection. The dag plots don't help much, can you post the exact commands you ran and attach both log-files (should be under ./log/ in the processing directory)?
Also which version of nanopype are you using, the src/singularity/docker version?
Of course you can do a pull request, if the changes apply in general, I'm happy to merge it!

Pay

@miguelpmachado
Copy link
Contributor Author

Sorry, I never remember that I should allways provide commands, versions, etc!
The commands before methylation analysis were:

snakemake --snakefile nanopype/Snakefile \
          --directory nanopype/ \
          --profile profiles/slurm/ \
          --use-singularity \
          --printshellcmds \
          --local-cores 1 \
          alignments/ngmlr/guppy/697_original_1strun.GRCh38_p12_Release_96.bam

snakemake --snakefile nanopype/Snakefile \
          --directory nanopype/ \
          --printshellcmds \
          --local-cores 1 \
          sequences/guppy/697_original_1strun.fastq.gz

snakemake --snakefile nanopype/Snakefile \
          --directory nanopype/ \
          --profile profiles/slurm/ \
          --use-singularity \
          --printshellcmds \
          --local-cores 1 \
          sv/sniffles/ngmlr/guppy/697_original_1strun.GRCh38_p12_Release_96.vcf

The methylation commands were:

# First run for bedGraph
snakemake --snakefile nanopype/Snakefile \
          --directory nanopype/ \
          --profile profiles/slurm/ \
          --use-singularity \
          --printshellcmds \
          --local-cores 1 \
          methylation/nanopolish/ngmlr/guppy/697_original_1strun.2x.GRCh38_p12_Release_96.bedGraph

# Second run for bw
snakemake --snakefile nanopype/Snakefile \
          --directory nanopype/ \
          --profile profiles/slurm/ \
          --use-singularity \
          --printshellcmds \
          --local-cores 1 \
          methylation/nanopolish/ngmlr/guppy/697_original_1strun.2x.GRCh38_p12_Release_96.bw

I uploaded both logs and stdouts (which are stderrs): logs_stdouts.zip. I still don't have the log file for the bw run because it is still running. I'll send it as soon as I have it.
I'm running the latest version of nanopype (v0.11.1) with a local conda installation, together with Singularity images.
When I finish testing the changes I've made, I'll do the pull request.
Thanks a lot for the quick reply.
Miguel

@miguelpmachado
Copy link
Contributor Author

Hi @giesselmann,
Here goes the files for the second methylation analysis for bw file: bw.stdout_log.zip. It didn't finish well because the libkrb5.so.3 library is missing from the methylation image. I will not include the installation of that library in methylation's Dockerfile in my pull request, because I don't know how do you want to install it.
Do you already have any idea why it is triggering the alignment again?
Miguel

@giesselmann
Copy link
Owner

Hi Miguel,
sorry for the delay, the logs are quite informative, the second run to get the bw does not create any new files. Since snakemake is deciding on what to process, it musst be something with the file/folder modification times. I'll have a look into it here and try to reproduce the issue.
Can you post the output of one more thing? If you run snakemake with --reason it shows for each rule, why this one needs to run. No need for the full output, just for one alignment rule.
Also thanks for the missing lib, I'll fix that in the next release.
Pay

@miguelpmachado
Copy link
Contributor Author

Hi @giesselmann,
I admit that I'm failing to reproduce this strange behavior:

  1. If I only run the second methylation analysis (that misses the bw file because of the missing library), it only runs the final step;
  2. If I deleted the bedGraph file and ask for it (running the first methylation analysis), it only runs the four final steps. And then if I ask for the bw file it only runs the bedGraph to bw conversion step.

So, I decided to rerun everything with --reason. I'm guessing that this odd behavior will occur again, because I already saw it twice when I run everything from the first command to the last one. But let see how it goes.
After I started the clean run, I noticed that I ask for the dag file before each run, and I didn't in the two attempts I described above. Now I'm not sure if asking for the dag might disturbe something (it shouldn't). After it finish running I'll test this too.
Anyway, thanks for all the attention.
Miguel

@giesselmann
Copy link
Owner

Hi Miguel,

I moved the container builds from Docker to Travis. During that development I also automated the unit tests and could not observe/reproduce the behavior you described initially.
Since snakemake is filesystem driven, my guess is that it must be some changes in the modification dates.

Let me know when you're still on this, otherwise I would close the issue.

Pay

@miguelpmachado
Copy link
Contributor Author

Hi @giesselmann,
I found what was triggering this second run: during methylation_nanopolish rule execution it is produced a fasta index in the reference directory that is an input of ngmlr rule. I fixed this behaviour with a rule that indexes the fasta and set the index as input of ngmlr rule. I'll push that in a second.
Thank you for the brain storming.
Miguel

@giesselmann
Copy link
Owner

That makes perfect sense, these *.fai are present in our institute, so I missed making a rule to produce them.

giesselmann added a commit that referenced this issue Jul 23, 2020
Threads as consumables
Index reference fasta file
Fixes #9
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants