Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabat fails when running with multiple input files. #32

Closed
grst opened this issue Jan 27, 2020 · 20 comments
Closed

Metabat fails when running with multiple input files. #32

grst opened this issue Jan 27, 2020 · 20 comments
Labels
bug Something isn't working
Milestone

Comments

@grst
Copy link
Member

grst commented Jan 27, 2020

When I run the pipeline on multiple samples, the metabat2 step fails for me with the following error message:

 [Error!] the order of contigs in abundance file is not the same as the assembly file: k119_0

The pipeline runs fine when I include only a single sample in the input directory. This might be related to #27.

Command executed

./main.nf --reads "/home/sturm/projects/2020/metagenomics_test/test_data/*_R[1,2].fastq.gz" -profile singularity --skip_spades

Input data

I ran the pipeline of publicly available metagenomics samples from the ibdmdb project. The fastq files can be downloaded here: https://ibdmdb.org/tunnel/public/HMP2/WGS/1818/rawfiles. For testing, I ran the pipeline on the 10 first samples listed in the web portal.

Full log

N E X T F L O W  ~  version 19.10.0                                                                                                                                                                                                                                               
Launching `./main.nf` [focused_kalam] - revision: 91f2d5ee09                                                                                                                                                                                                                      
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`                                                                                                                                                           
[2m----------------------------------------------------                                                                                                                                                                                                                           
                                        ,--./,-.                                                                                                                                                                                                                                  
        ___     __   __   __   ___     /,-._.--~'                                                                                                                                                                                                                                 
  |\ | |__  __ /  ` /  \ |__) |__         }  {                                                                                                                                                                                                                                    
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,                                                                                                                                                                                                                                 
                                        `._,._,'                                                                                                                                                                                                                                  
  nf-core/mag v1.0.0                                                                                                                                                                                                                                                              
----------------------------------------------------                                                                                                                                                                                                                              
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`                                                                                                                                                                   
Run Name          : focused_kalam                                                                                                                                                                                                                                                 
Reads             : /home/sturm/projects/2020/metagenomics_test/test_data/*_R[1,2].fastq.gz                                                                                                                                                                                       
Fasta Ref         : null                                                                                                                                                                                                                                                          
Data Type         : Paired-End                                                                                                                                                                                                                                                    
Busco Reference   : https://busco-archive.ezlab.org/v3/datasets/bacteria_odb9.tar.gz                                                                                                                                                                                              
Max Resources     : 128 GB memory, 16 cpus, 10d time per job                                                                                                                                                                                                                      
Container         : singularity - nfcore/mag:1.0.0                                                                                                                                                                                                                                
Output dir        : ./results                                                                                                                                                                                                                                                     
Launch dir        : /home/sturm/projects/2020/metagenomics_test/mag                                                                                                                                                                                                               
Working dir       : /data/scratch/sturm/scratch/test_metagenomics                                                                                                                                                                                                                 
Script dir        : /home/sturm/projects/2020/metagenomics_test/mag                                                                                                                                                                                                               
User              : sturm                                                                                                                                                                                                                                                         
Config Profile    : singularity                                                                                                                                                                                                                                          
[2m----------------------------------------------------       
executor >  sge (17)                                                                                                                                                                                                                                                              
[2a/f38182] process > get_software_versions                                         [100%] 1 of 1, cached: 1 ✔                                                                                                                                                                    
[-        ] process > porechop                                                      -                                                                                                                                                                                             
[-        ] process > nanolyse                                                      -                                                                                                                                                                                             
[-        ] process > filtlong                                                      -                                                                                                                                                                                             
[-        ] process > nanoplot                                                      -                                                                                                                                                                                             
[22/d10cd6] process > fastqc_raw (CSM5MCW6)                                         [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[a8/e30a96] process > fastp (CSM5MCXH)                                              [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[a8/7e6467] process > phix_download_db (GCA_002596845.1_ASM259684v1_genomic.fna.gz) [100%] 1 of 1, cached: 1 ✔                                                                                                                                                                    
[b5/bfc0ac] process > remove_phix (CSM5MCXD)                                        [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[97/5edf95] process > fastqc_trimmed (CSM5MCXD)                                     [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[-        ] process > centrifuge_db_preparation                                     -                                                                                                                                                                                             
[-        ] process > centrifuge                                                    -                                                                                                                                                                                             
[-        ] process > kraken2_db_preparation                                        -                                                                                                                                                                                             
[-        ] process > kraken2                                                       -                                                                                                                                                                                             
[-        ] process > krona_db                                                      -                                                                                                                                                                                             
[-        ] process > krona                                                         -                                                                                                                                                                                             
[c3/6e32ef] process > megahit (CSM5MCX3)                                            [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[-        ] process > spadeshybrid                                                  -                                                                                                                                                                                             
[-        ] process > spades                                                        -                                                                                                                                                                                             
[a7/eb6fb5] process > quast (MEGAHIT-CSM5MCXH)                                      [100%] 10 of 10, cached: 10 ✔                                                                                                                                                                 
[ac/92db93] process > bowtie2 (MEGAHIT-CSM5MCXJ)                                    [100%] 100 of 100, cached: 100 ✔                                                                                                                                                              
[70/15a36b] process > metabat (MEGAHIT-CSM5MCX3)                                    [100%] 10 of 10, failed: 3 ✘                                                                                                                                                                  
[76/f4dd2b] process > busco_download_db (bacteria_odb9.tar)                         [100%] 1 of 1, cached: 1 ✔                                                                                                                                                                    
[25/932b5f] process > busco (MEGAHIT-CSM5MCW6.2.fa)                                 [100%] 6 of 6                                                                                                                                                                                 
[-        ] process > busco_plot                                                    -                                                                                                                                                                                             
[b8/cc0d5f] process > quast_bins (MEGAHIT-CSM5MCW6)                                 [100%] 1 of 1                                                                                                                                                                                 
[-        ] process > merge_quast_and_busco                                         -                                                                                                                                                                                             
[-        ] process > cat_db                                                        -                                                                                                                                                                                             
[-        ] process > cat                                                           -                                                                                                                                                                                             
[-        ] process > multiqc                                                       -                                                                                                                                                                                             
[13/073856] process > output_documentation (1)                                      [100%] 1 of 1, cached: 1 ✔                                                                                                                                                                    
[0;35m[nf-core/mag] Pipeline completed with errors          
Error executing process > 'metabat (MEGAHIT-CSM5MCWQ)'

Caused by:
  Process `metabat (MEGAHIT-CSM5MCWQ)` terminated with an error exit status (1)

Command executed:

  jgi_summarize_bam_contig_depths --outputDepth depth.txt MEGAHIT-CSM5MCWQ-CSM5MCXJ.bam MEGAHIT-CSM5MCWQ-CSM5MCX3.bam MEGAHIT-CSM5MCWQ-CSM5MCW6.bam MEGAHIT-CSM5MCWQ-CSM5MCWQ.bam MEGAHIT-CSM5MCWQ-CSM5MCXN.bam MEGAHIT-CSM5MCWQ-CSM5MCXL.bam MEGAHIT-CSM5MCWQ-CSM5FZ4M.bam MEGAHI
T-CSM5MCWQ-CSM5MCUO.bam MEGAHIT-CSM5MCWQ-CSM5MCXH.bam MEGAHIT-CSM5MCWQ-CSM5MCXD.bam
  metabat2 -t "8" -i "CSM5MCXL.contigs.fa" -a depth.txt -o "MetaBAT2/MEGAHIT-CSM5MCWQ" -m 1500
  
  #if bin folder is empty
  if [ -z "$(ls -A MetaBAT2)" ]; then
      cp CSM5MCXL.contigs.fa MetaBAT2/MEGAHIT-CSM5MCXL.contigs.fa
  fi

Command exit status:
  1

Command output:
  MetaBAT 2 (v2.13 (Bioconda)) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. 

Command error:
  Output depth matrix to depth.txt
  jgi_summarize_bam_contig_depths 2.13 (Bioconda) 2019-06-11T06:53:12
  Output matrix to depth.txt
  0: Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCXJ.bam
  93: Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCWQ.bam4: Opening bam: 
  MEGAHIT-CSM5MCWQ-CSM5MCXN.bam7: Opening bam: 
  : Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCXD.bam
  MEGAHIT-CSM5MCWQ-CSM5MCUO.bam
  2: Opening bam: 5MEGAHIT-CSM5MCWQ-CSM5MCW6.bam: Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCXL.bam
  
  1: Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCX3.bam
  86: Opening bam: MEGAHIT-CSM5MCWQ-CSM5FZ4M.bam
  : Opening bam: MEGAHIT-CSM5MCWQ-CSM5MCXH.bam
  Processing bam files
  Thread 2 finished: MEGAHIT-CSM5MCWQ-CSM5MCW6.bam with 18135298 reads and 9888770 readsWellMapped
  Thread 9 finished: MEGAHIT-CSM5MCWQ-CSM5MCXD.bam with 22283812 reads and 6970361 readsWellMapped
  Thread 0 finished: MEGAHIT-CSM5MCWQ-CSM5MCXJ.bam with 22264616 reads and 8236430 readsWellMapped
  Thread 8 finished: MEGAHIT-CSM5MCWQ-CSM5MCXH.bam with 24528496 reads and 10391605 readsWellMapped
  Thread 4 finished: MEGAHIT-CSM5MCWQ-CSM5MCXN.bam with 23731976 reads and 7460350 readsWellMapped
  Thread 5 finished: MEGAHIT-CSM5MCWQ-CSM5MCXL.bam with 28607918 reads and 9303132 readsWellMapped
  Thread 1 finished: MEGAHIT-CSM5MCWQ-CSM5MCX3.bam with 26832004 reads and 6496630 readsWellMapped
  Thread 7 finished: MEGAHIT-CSM5MCWQ-CSM5MCUO.bam with 31554136 reads and 7481572 readsWellMapped
  Thread 3 finished: MEGAHIT-CSM5MCWQ-CSM5MCWQ.bam with 17952520 reads and 15715720 readsWellMapped
  Thread 6 finished: MEGAHIT-CSM5MCWQ-CSM5FZ4M.bam with 28383496 reads and 25071349 readsWellMapped
  Creating depth matrix file: depth.txt
  Closing most bam files
  Closing last bam file
  Finished
  [Error!] the order of contigs in abundance file is not the same as the assembly file: k119_0

Work dir:
  /data/scratch/sturm/scratch/test_metagenomics/be/42009fbbe39e43da5e595006558f28

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
@HadrienG
Copy link
Member

Hi!

Thanks for reporting this. I'll take a look and come back to you.

Best,
/Hadrien

@HadrienG HadrienG added the bug Something isn't working label Jan 29, 2020
@d4straub
Copy link
Collaborator

d4straub commented Feb 4, 2020

I doubt it has anything to do with #27, because #27 is happening afterwards.

Maybe bam sorting interferes? But as far as I understood, it doesn't sort the contigs, does it?

@HadrienG did you test that already?

@HadrienG
Copy link
Member

HadrienG commented Feb 6, 2020

It's the channel mixing that seems to combine the wrong samples together.

I don't have a fix yet, but will look at it more today

@d4straub
Copy link
Collaborator

d4straub commented Feb 6, 2020

That's unfortunate that this slipped the QC before release. Maybe include @maxulysse into solving this since he added the channel magic?

I actually also have four samples that I wanted to have analyzed in one go instead of individually.

@maxulysse
Copy link
Member

I'll have a look right away, thanks for mentioning me @d4straub

@HadrienG
Copy link
Member

HadrienG commented Feb 7, 2020

So far I haven't been able to reproduce the bug. I just noticed @grst used --skip_spades and I didn't in my tests though.

It's possible that --skip_spades (or any of the skip assembler options) triggers that issue.

UPDATE: still not crashing for me with --skip_spades on my samples. I'm downloading the test data mentioned in the issue and will try with those

@d4straub
Copy link
Collaborator

d4straub commented Feb 7, 2020

Thanks for the updates, I just started to run my samples without skipping anything, will see what happens.

@ropolomx
Copy link

ropolomx commented Mar 4, 2020

I got the same error as well, but I did not use --skip_spades.

@HadrienG
Copy link
Member

@d4straub did you run into this as well?

@HadrienG HadrienG added this to the 1.1.0 milestone Mar 10, 2020
@d4straub
Copy link
Collaborator

d4straub commented Mar 23, 2020

Sorry for the late answer. Yes I did run it and it worked fine.

edit: we need a dataset / command to reproduce this issue to trouble shoot it.

@d4straub
Copy link
Collaborator

I still cannot reproduce this. Does anyone else had that problem?

@ropolomx
Copy link

I did have that problem again, but with datasets different than the ones I commented about on March 4.

@d4straub
Copy link
Collaborator

Finally I also encountered that problem!

Currently, I suspect that it is caused by parallelization and it seems to help to set the numbers of cores to 1 for that process. But my pipeline run isn't finished yet, will see if it indeed does help.

@d4straub
Copy link
Collaborator

Alright, I resumed the run that failed because of the described error with the additional parameter
-c "one-core-for-metabat.config"
where the file one-core-for-metabat.config contained:

process {
  withName: metabat {
    cpus = { 1 }
    memory = { 10.GB }
    time = { 4.h }
  }
}

and the run ended successful.

@d4straub d4straub mentioned this issue Jun 19, 2020
8 tasks
@d4straub
Copy link
Collaborator

PR is made against dev that should solve it, at least I hope so.

@ropolomx
Copy link

Thank you @d4straub ! I will give this a try.

@skrakau
Copy link
Member

skrakau commented Jun 25, 2020

Hi @ropolomx,
it seems there is still a channel problem as @HadrienG already suspected. I am working on it and will keep you updated.

@skrakau skrakau reopened this Jun 25, 2020
@skrakau skrakau mentioned this issue Jun 28, 2020
8 tasks
@skrakau
Copy link
Member

skrakau commented Jun 29, 2020

This issue should be solved in the latest dev version.

@ropolomx
Copy link

ropolomx commented Jul 8, 2020

Thank you @skrakau! Will give this a try!

@ropolomx
Copy link

Hi @skrakau. This is working when I run the latest dev version. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants