Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSESS_SIGNIFICANCE fails with Rscript error #1239

Closed
bounlu opened this issue Sep 18, 2023 · 11 comments
Closed

ASSESS_SIGNIFICANCE fails with Rscript error #1239

bounlu opened this issue Sep 18, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@bounlu
Copy link
Contributor

bounlu commented Sep 18, 2023

Description of the bug

ASSESS_SIGNIFICANCE step keeps failing at Rscript with the below error:

ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:ASSESS_SIGNIFICANCE (AF60_72U_vs_AF00_69U)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:ASSESS_SIGNIFICANCE (AF60_72U_vs_AF00_69U)` terminated with an error exit status (1)

Command executed:

  cat $(which assess_significance.R) | R --slave --args AF60_72U_vs_AF00_69U.tumor.mpileup.gz_CNVs AF60_72U_vs_AF00_69U.tumor.mpileup.gz_normal_CNVs AF60_72U_vs_AF00_69U.tumor.mpileup.gz_normal_ratio.txt AF60_72U_vs_AF00_69U.tumor.mpileup.gz_ratio.txt
  
  mv *.p.value.txt AF60_72U_vs_AF00_69U.p.value.txt
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:ASSESS_SIGNIFICANCE":
      controlfreec: $(echo $(freec -version 2>&1) | sed 's/^.*Control-FREEC  //; s/:.*$//' | sed -e "s/Control-FREEC v//g" )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Loading required package: GenomicRanges
  Loading required package: stats4
  Loading required package: BiocGenerics
  Loading required package: parallel
  
  Attaching package: ‘BiocGenerics’
  
  The following objects are masked from ‘package:parallel’:
  
      clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
      clusterExport, clusterMap, parApply, parCapply, parLapply,
      parLapplyLB, parRapply, parSapply, parSapplyLB
  
  The following objects are masked from ‘package:stats’:
  
      IQR, mad, sd, var, xtabs
  
  The following objects are masked from ‘package:base’:
  
      anyDuplicated, append, as.data.frame, basename, cbind, colnames,
      dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
      grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
      order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
      rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
      union, unique, unsplit, which.max, which.min
  
  Loading required package: S4Vectors
  
  Attaching package: ‘S4Vectors’
  
  The following object is masked from ‘package:base’:
  
      expand.grid
  
  Loading required package: IRanges
  Loading required package: GenomeInfoDb
  Error in `$<-.data.frame`(`*tmp*`, Ratio, value = logical(0)) : 
    replacement has 0 rows, data has 701
  Calls: $<- -> $<-.data.frame
  Execution halted

Work dir:
  /nextflow/sarek/work/bc/7f398053d6401db4e9aa9a25eca18c

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Command used and terminal output

nextflow run nf-core/sarek \
-latest \
-profile docker \
--step mapping \
--tools freebayes,mpileup,mutect2,strelka,manta,tiddit,ascat,cnvkit,controlfreec,msisensorpro \
--input 'samplesheet_sarek.csv' \
--outdir '/nextflow/sarek/results/' \
-work-dir '/nextflow/sarek/work/' \
-c 'custom.config' \
-r master \
-resume

Relevant files

No response

System information

N E X T F L O W ~ version 23.08.1-edge
local
Docker
Linux
nf-core/sarek master v3.3.0

@bounlu bounlu added the bug Something isn't working label Sep 18, 2023
@FriederikeHanssen
Copy link
Contributor

FriederikeHanssen commented Sep 18, 2023

urgh controlfreec. I just updated the containers. You could check if the new version fixes it by adding this to a custom config:

process {
     withName: ASSESS_SIGNIFICANCE {
        container = 'biocontainers/control-freec:11.6b--hdbdd923_0'
       }
}

@nschcolnicov
Copy link
Contributor

nschcolnicov commented Sep 18, 2023

Hi! I Just got this same error, it doesn't appear to be related to the new container update, since I was able to reproduce with both the old and new container. @bounlu could you please share the nextflow.log file?

Looking at the .command.sh in my run, I saw that the script is being called with 4 arguments instead of 2, as the CONTROLFREEC_ASSESSSIGNIFICANCE process expects:

.command.sh:

cat $(which assess_significance.R) | R --slave --args dataset1_vs_control01.tumor.mpileup.gz_CNVs dataset1_vs_control01.tumor.mpileup.gz_normal_CNVs dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.txt dataset1_vs_control01.tumor.mpileup.gz_ratio.txt

mv *.p.value.txt dataset1_vs_control01.p.value.txt

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:ASSESS_SIGNIFICANCE":
    controlfreec: $(echo $(freec -version 2>&1) | sed 's/^.*Control-FREEC  //; s/:.*$//' | sed -e "s/Control-FREEC v//g" )
END_VERSIONS

So instead of running the assess_significance.R script on the cnv and ratio file, it is being run on the cnv and cnv normal files. And the assess_significance.R script fails in the line 13:
ratio$Ratio[which(ratio$Ratio==-1)]=NA
Because the ratio dataframe doesn't contain any columns labeled "Ratio"

In my case I'm getting two files out of the FREEC_SOMATIC module associated to the same meta dict for both CNV and RATIO.
Looking at the way that the input tuple is generated, in the BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC subworkflow, in line 30.
In my run this is what the FREEC_SOMATIC.out.CNV and the FREEC_SOMATIC.out.ratio contained respectively:

FREEC_SOMATIC.out.CNV.view()
[[id:dataset1_vs_control01, normal_id:control01, patient:test3, sex:XX, tumor_id:dataset1, variantcaller:samtools], [tmpdir/dataset1_vs_control01.tumor.mpileup.gz_CNVs, tmpdir/dataset1_vs_control01.tumor.mpileup.gz_normal_CNVs]]

FREEC_SOMATIC.out.ratio.view()
[[id:dataset1_vs_control01, normal_id:control01, patient:test3, sex:XX, tumor_id:dataset1, variantcaller:samtools], [tmpdir/dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.txt, tmpdir/dataset1_vs_control01.tumor.mpileup.gz_ratio.txt]]

FREEC_SOMATIC.out.CNV.join(FREEC_SOMATIC.out.ratio, failOnDuplicate: true, failOnMismatch: true).view()
[[id:dataset1_vs_control01, normal_id:control01, patient:test3, sex:XX, tumor_id:dataset1, variantcaller:samtools], [/scratch/nextflow/work/schcolni/7f/3b1037262fd54a07b5a24d5df8d559/dataset1_vs_control01.tumor.mpileup.gz_CNVs, /scratch/nextflow/work/schcolni/7f/3b1037262fd54a07b5a24d5df8d559/dataset1_vs_control01.tumor.mpileup.gz_normal_CNVs], [/scratch/nextflow/work/schcolni/7f/3b1037262fd54a07b5a24d5df8d559/dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.txt, /scratch/nextflow/work/schcolni/7f/3b1037262fd54a07b5a24d5df8d559/dataset1_vs_control01.tumor.mpileup.gz_ratio.txt]]

I'm not sure what is the expected behavior of this module. @FriederikeHanssen, should it run on both the normal and non-normal files, or if it should only run on the non-normal files?

@FriederikeHanssen
Copy link
Contributor

Controlfreec should run on tumor-only and paired data.

@bounlu
Copy link
Contributor Author

bounlu commented Sep 19, 2023

@FriederikeHanssen Thanks for the updated container, I will also test with that and let you know.

@nschcolnicov My .command.sh also has 4 paramaters, and my data is paired.

@nschcolnicov
Copy link
Contributor

nschcolnicov commented Sep 19, 2023

Hi @bounlu @FriederikeHanssen I kept researching this issue, and found this:
When running the somatic tumor pair test dataset provided by sarek using this command line:
nextflow run ../../main.nf -profile bi,aws,test_cache,tools_somatic --tools strelka,controlfreec --outdir ./results
the "*gz_normal_CNVs" and the "*_normal_ratio.txt" files are not generated, we only find the "*gz_CNVs" and the "*gz_ratio.txt". But I'm not sure if this depends on the dataset.

I'm still not sure whether if we should run the assess_significance module on the two file pairs of cnv and ratio, those with the "normal" label and those without the "normal" label; or, if we should only run it on the file pairs that don't have the "normal" label. So I implemented a solution for the first case, since we can just ignore the results for the second pair if they are not relevant.

I implemented the following solution in my local version of the pipeline:
Edited the bam_variant_calling_somatic_controlfreec subworkflow main file
By replacing the lines between line 27 and line 30, inclusive, with these lines:

    assess_significance_input = Channel.empty()

    FREEC_SOMATIC(controlfreec_input, fasta, fasta_fai, [], dbsnp, dbsnp_tbi, chr_files, mappability, intervals_bed, [])

    //Filter the files that come out of freec somatic as ASSESS_SIGNIFICANCE only takes one cnv and one ratio file, there should be one normal and one non normal pair?
    //Creates empty channel if file is missing
    cnv_files = FREEC_SOMATIC.out.CNV
    .multiMap{ meta, cnv ->
        def meta_clone_tumor = meta.clone()
        def meta_clone_normal = meta.clone()
        meta_clone_tumor.id = meta.id + "_tumor" //updating meta id so that the p.value file is named differently 
        meta_clone_normal.id = meta.id + "_normal" //updating meta id so that the p.value file is named differently 

        def tumor_file = cnv instanceof List ? cnv.find { it.toString().endsWith("gz_CNVs") } : cnv //only find if its a list, else it returns only the filename without the path
        def normal_file = cnv instanceof List ? cnv.find { it.toString().endsWith("gz_normal_CNVs") } : null //only find if its a list, else it returns only the filename without the path

        normal: normal_file == null ? [] : [meta_clone_normal,normal_file] //only fill channel if file was found, else leave it empty
        tumor: tumor_file == null ? [] : [meta_clone_tumor,tumor_file] //only fill channel if file was found, else leave it empty
    }
    
    ratio_files = FREEC_SOMATIC.out.ratio
    .multiMap{ meta, ratio ->
        def meta_clone_tumor = meta.clone()
        def meta_clone_normal = meta.clone()
        meta_clone_tumor.id = meta.id + "_tumor" //updating meta id so that the p.value file is named differently 
        meta_clone_normal.id = meta.id + "_normal" //updating meta id so that the p.value file is named differently 

        def tumor_file = ratio instanceof List ? ratio.find { it.toString().endsWith("gz_ratio.txt") } : ratio //same here as cnv
        def normal_file = ratio instanceof List ? ratio.find { it.toString().endsWith("gz_normal_ratio.txt") } : null //same here as cnv
        
        normal: normal_file == null ? [] : [meta_clone_normal,normal_file] //same here as ratio
        tumor: tumor_file == null ? [] : [meta_clone_tumor,tumor_file] //same here as ratio
    }

    //Join the pairs
    normal_files = cnv_files.normal.join(ratio_files.normal, failOnDuplicate: true, failOnMismatch: true)
    tumor_files = cnv_files.tumor.join(ratio_files.tumor, failOnDuplicate: true, failOnMismatch: true)
    
    //Mix all the pairs into input channel
    assess_significance_input = assess_significance_input.mix(tumor_files)
    assess_significance_input = assess_significance_input.mix(normal_files)

    ASSESS_SIGNIFICANCE(assess_significance_input)

This is the full version of the bam_variant_calling_somatic_controlfreec subworkflow main file, you can just replace your with this:
bam_variant_calling_somatic_controlfreec_main.txt

This is how the channel "assess_significance_input" looks like, so the module runs first on one pair, and then on the other:

[[id:dataset1_vs_control01, normal_id:control01, patient:test3, sex:XX, tumor_id:dataset1, variantcaller:samtools], tmpdir/dataset1_vs_control01.tumor.mpileup.gz_normal_CNVs, tmpdir/dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.txt]
[[id:dataset1_vs_control01, normal_id:control01, patient:test3, sex:XX, tumor_id:dataset1, variantcaller:samtools], tmpdir/dataset1_vs_control01.tumor.mpileup.gz_CNVs, tmpdir/dataset1_vs_control01.tumor.mpileup.gz_ratio.txt]

I tested this implementation on runs that generate the "normal" file pairs, and runs that don't generate the "normal" files, and they both ran succesfully, the only different in output is that the runs that generate "normal" cnv and ratio files will include the p.value.txt file for them too. This is what the controlfreec folder in my results/variant_calling directory looked like:

controlfreec/
├── dataset1_vs_control01
│   ├── config.txt
│   ├── dataset1_vs_control01_BAF.png
│   ├── dataset1_vs_control01.bed
│   ├── dataset1_vs_control01.circos.txt
│   ├── dataset1_vs_control01.normal.mpileup.gz_BAF.txt
│   ├── dataset1_vs_control01.normal.mpileup.gz_control.cpn
│   ├── dataset1_vs_control01.p.value.txt
│   ├── dataset1_vs_control01_ratio.log2.png
│   ├── dataset1_vs_control01_ratio.png
│   ├── dataset1_vs_control01.tumor.mpileup.gz_BAF.txt
│   ├── dataset1_vs_control01.tumor.mpileup.gz_CNVs
│   ├── dataset1_vs_control01.tumor.mpileup.gz_info.txt
│   ├── dataset1_vs_control01.tumor.mpileup.gz_normal_CNVs
│   ├── dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.BedGraph
│   ├── dataset1_vs_control01.tumor.mpileup.gz_normal_ratio.txt
│   ├── dataset1_vs_control01.tumor.mpileup.gz_ratio.BedGraph
│   ├── dataset1_vs_control01.tumor.mpileup.gz_ratio.txt
│   └── dataset1_vs_control01.tumor.mpileup.gz_sample.cpn
├── dataset1_vs_control01_normal
│   └── dataset1_vs_control01_normal.p.value.txt
└── dataset1_vs_control01_tumor
    └── dataset1_vs_control01_tumor.p.value.txt

@bounlu
Copy link
Contributor Author

bounlu commented Sep 24, 2023

I confirm that I still get the same error with the updated container.

@bounlu
Copy link
Contributor Author

bounlu commented Oct 5, 2023

In the new version 3.3.2, I still get error from CONTROLFREEC, this time from MAKEGRAPH process:


Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH (XXX_7D_vs_YYY_6D)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH (XXX_7D_vs_YYY_6D)` terminated with an error exit status (1)

Command executed:

  cat $(which makeGraph.R) | R --slave --args 2 XXX_7D_vs_YYY_6D.tumor.mpileup.gz_normal_ratio.txt XXX_7D_vs_YYY_6D.tumor.mpileup.gz_ratio.txt XXX_7D_vs_YYY_6D.normal.mpileup.gz_BTT.txt XXX_7D_vs_YYY_6D.tumor.mpileup.gz_BTT.txt
  
  mv *_BAF.txt.png XXX_7D_vs_YYY_6D_BAF.png
  mv *_ratio.txt.log2.png XXX_7D_vs_YYY_6D_ratio.log2.png
  mv *_ratio.txt.png XXX_7D_vs_YYY_6D_ratio.png
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH":
      controlfreec: $(echo $(freec -version 2>&1) | sed 's/^.*Control-FREEC  //; s/:.*$//' | sed -e "s/Control-FREEC v//g" )
  END_VERSIONS

Command exit status:
  1

Command output:
  null device 
            1 
  null device 
            1 

Command error:
  Warning message:
  In type.convert.default(args[4]) :
    'as.is' should be specified by the caller; using TRUE
  There were 50 or more warnings (use warnings() to see the first 50)
  null device 
            1 
  null device 
            1 
  Error in xy.coords(x, y, xlabel, ylabel, log) : 
    'x' and 'y' lengths differ
  Calls: plot -> plot.default -> xy.coords
  Execution halted

Work dir:
  /nextflow/sarek/2205/work/3f/795232eTT93c4eb7b46d3f67d55b05

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for detYYls
Join mismatch for the following entries: key=[patient:TT, sample:TTT_6D, sex:XY, status:0, id:TTT_6D, data_type:cram, variantcaller:bcftools, file_name:TTT_6D.bcftools.vcf] values=


Join mismatch for the following entries: 
- key=[patient:YY, sample:YYY_6D, sex:XX, status:0, id:YYY_6D, data_type:cram, variantcaller:bcftools, file_name:YYY_6D.bcftools.vcf] values= 
- key=[patient:TT, sample:TTT_6D, sex:XY, status:0, id:TTT_6D, data_type:cram, variantcaller:bcftools, file_name:TTT_6D.bcftools.vcf] values=


-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing running tasks (17)

ERROR ~ Cannot invoke "org.iq80.leveldb.impl.Version.retYYn()" because "this.version" is null

@Teezi
Copy link

Teezi commented Oct 5, 2023

I am using the new version too (3.3.2), and have a similar problem as @bounlu, please find below

[91/c9321b] NOTE: Process NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:ASSESS_SIGNIFICANCE (AUD220512001_tumor_vs_AUD210903021_normal) terminated with an error exit status (1) -- Execution is retried (1)
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH (AUD220512001_tumor_vs_AUD210903021_normal)'

Caused by:
Process NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH (AUD220512001_tumor_vs_AUD210903021_normal) terminated with an error exit status (1)

Command executed:

cat $(which makeGraph.R) | R --slave --args 2 AUD220512001_tumor_vs_AUD210903021_normal.tumor.mpileup.gz_normal_ratio.txt AUD220512001_tumor_vs_AUD210903021_normal.tumor.mpileup.gz_ratio.txt AUD220512001_tumor_vs_AUD210903021_normal.normal.mpileup.gz_BAF.txt AUD220512001_tumor_vs_AUD210903021_normal.tumor.mpileup.gz_BAF.txt

mv *_BAF.txt.png AUD220512001_tumor_vs_AUD210903021_normal_BAF.png
mv *_ratio.txt.log2.png AUD220512001_tumor_vs_AUD210903021_normal_ratio.log2.png
mv *_ratio.txt.png AUD220512001_tumor_vs_AUD210903021_normal_ratio.png

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_CONTROLFREEC:MAKEGRAPH":
controlfreec: $(echo $(freec -version 2>&1) | sed 's/^.Control-FREEC //; s/:.$//' | sed -e "s/Control-FREEC v//g" )
END_VERSIONS

Command exit status:
1

Command output:
null device
1
null device
1

Command error:
Warning message:
In type.convert.default(args[4]) :
'as.is' should be specified by the caller; using TRUE
There were 50 or more warnings (use warnings() to see the first 50)
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
Calls: plot -> plot.default -> xy.coords
Execution halted

Work dir:
/fs04/bv48/workspace-nobackup/maria/sarek/jGCT/2_variantCalling/work/48/fd8e1287759ace6bd5f2685657448d

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

@FriederikeHanssen
Copy link
Contributor

Hey! Sorry for the delay. Finally had some time to dig into the ASSESS_SIGNIFICANCE error. I think there are two things:

  1. We should only use tumor_CNVS and tumor_ratio.txt, so will fix that.

and
2. there seems to be an issue with the R script itself. When doing 1. manually I ran into the issue described here: BoevaLab/FREEC#127, updating the script as proposed seems to work.

@nschcolnicov
Copy link
Contributor

@bounlu @Teezi This PR fixed the issue for me, could you please check if you are still getting the error? Else we can close this issue

@bounlu
Copy link
Contributor Author

bounlu commented Aug 8, 2024

PR fixed the issue, closed.

@bounlu bounlu closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants