Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrimGalore Error due to python version #215

Closed
peter-yufan-zeng opened this issue Jun 8, 2020 · 27 comments
Closed

TrimGalore Error due to python version #215

peter-yufan-zeng opened this issue Jun 8, 2020 · 27 comments
Milestone

Comments

@peter-yufan-zeng
Copy link

There seems to be a problem with enabling trim-galore in Sarek.

Running on both an cluster and a local computer, sarek 2.6 throws error
Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file HPV5N_2_R1.fastq.gz ERROR: Running in parallel is not supported on Python 2 Cutadapt terminated with exit signal: '256'. Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

From the error, this seems to be an issue with the python version used.

@ggabernet
Copy link
Member

ggabernet commented Jun 8, 2020

Hi, @mGauder had a similar issue with Sarek 2.6

-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing pending tasks (64)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'TrimGalore (GBM2-L004)'
Caused by:
  Process `TrimGalore (GBM2-L004)` terminated with an error exit status (1)
Command executed:
  trim_galore          --cores 4         --paired         --fastqc         --gzip                                      QLFGB013AT_T_L004_R1.fastq.gz QLFGB013AT_T_L004_R2.fastq.gz
  mv *val_1_fastqc.html "QLFGB013AT_T_L004_R1.trimmed_fastqc.html"
  mv *val_2_fastqc.html "QLFGB013AT_T_L004_R2.trimmed_fastqc.html"
  mv *val_1_fastqc.zip "QLFGB013AT_T_L004_R1.trimmed_fastqc.zip"
  mv *val_2_fastqc.zip "QLFGB013AT_T_L004_R2.trimmed_fastqc.zip"
Command exit status:
  1
Command output:
  (empty)
Command error:
  Cutadapt version: 1.18
  Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<)
  Letting the (modified) Cutadapt deal with the Python version instead
  Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 4 cores
  No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
  AUTO-DETECTING ADAPTER TYPE
  ===========================
  Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> QLFGB013AT_T_L004_R1.fastq.gz <<)
  Found perfect matches for the following adapter sequences:
  Adapter type  Count   Sequence        Sequences analysed      Percentage
  Illumina      4571    AGATCGGAAGAGC   1000000 0.46
  smallRNA      7       TGGAATTCTCGG    1000000 0.00
  Nextera       4       CTGTCTCTTATA    1000000 0.00
  Using Illumina adapter for trimming (count: 4571). Second best hit was smallRNA (count: 7)
  Writing report to 'QLFGB013AT_T_L004_R1.fastq.gz_trimming_report.txt'
  SUMMARISING RUN PARAMETERS
  ==========================
  Input filename: QLFGB013AT_T_L004_R1.fastq.gz
  Trimming mode: paired-end
  Trim Galore version: 0.6.4_dev
  Cutadapt version: 1.18
  Python version: could not detect
  Number of cores used for trimming: 4
  Quality Phred score cutoff: 20
  Quality encoding type selected: ASCII+33
  Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
  Maximum trimming error rate: 0.1 (default)
  Minimum required adapter overlap (stringency): 1 bp
  Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
  Running FastQC on the data once trimming has completed
  Output file(s) will be GZIP compressed
  Cutadapt seems to be reasonably up-to-date. Setting -j 4
  Writing final adapter and quality trimmed output to QLFGB013AT_T_L004_R1_trimmed.fq.gz
    >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file QLFGB013AT_T_L004_R1.fastq.gz <<<
  ERROR: Running in parallel is not supported on Python 2
  Cutadapt terminated with exit signal: '256'.
  Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
Work dir:
  <path>/work/90/bd8420c35f4821fda03c387d8499bd
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

@maxulysse
Copy link
Member

what was your command lines, so that I can reproduce and try to figure what is the issue?

@ggabernet
Copy link
Member

ggabernet commented Jun 8, 2020

nextflow run nf-core/sarek -r 2.6 --input input.tsv -profile cfc -c user.config --genome 'GRCh37' --tools 'Strelka,snpEff' --trim_fastq --save_trimmed --targetBED Twist_Exome_Target_hg19_new.bed -resume

And Marie tried as well to set the cpus in the user.config but got the same error:

process {
    withName:TrimGalore {
        cpus = 2
    }
    withName:MultiQC {
        memory = {60.GB as nextflow.util.MemoryUnit}
    }
}

@ghost
Copy link

ghost commented Jun 8, 2020

Update: I just deleted my work directory and started the pipeline from scratch and the TrimGalore process are now running. 🎉

@ggabernet
Copy link
Member

What seems weird is that even though in the process the cores are set to 1, Cutadapt seems to still use 4 cores: Cutadapt seems to be reasonably up-to-date. Setting -j 4

@FelixKrueger
Copy link

I hope this isn't an issue from my side....

@peter-yufan-zeng
Copy link
Author

@maxulysse This is my command line. In the custom config file I have set trim_fastq = true.
./nextflow run sarek/main.nf -profile singularity --input UWO192.tsv --custom_config_base sarek/conf -c nextflow.slurm.v6.config --genomes_base "igenomes_ref" --tools 'Strelka,mutect2,Manta,MSIsensor,SnpEff' -resume --no_gatk_spark
Both deleting the work directory / running it on my local computer does not solve the issue.

@maxulysse
Copy link
Member

No problem at all on my side.
Trimgalore process succeeded without issues.
I definitively can't reproduce this bug.

@nf-core/core could it be a collision with the python on the system?

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

I'd suspect that there is a collission on the system level too. Harshil fixed it in ATACseq for example:

https://github.com/nf-core/atacseq/blob/fa1e3f8993cd20e249b9df09d29c5498eff311d2/nextflow.config#L130

// Export this variable to prevent local Python libraries from conflicting with those in the container
env {
  PYTHONNOUSERSITE = 1
}

@ghost
Copy link

ghost commented Jun 9, 2020

With the settings stated above by @ggabernet, the TrimGalore process now finished without any other errors occuring.

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

These here:

nextflow run nf-core/sarek -r 2.6 --input input.tsv -profile cfc -c user.config --genome 'GRCh37' --tools 'Strelka,snpEff' --trim_fastq --save_trimmed --targetBED Twist_Exome_Target_hg19_new.bed -resume

And Marie tried as well to set the cpus in the user.config but got the same error:

process {
    withName:TrimGalore {
        cpus = 2
    }
    withName:MultiQC {
        memory = {60.GB as nextflow.util.MemoryUnit}
    }
}

Or these?

What seems weird is that even though in the process the cores are set to 1, Cutadapt seems to still use 4 cores: Cutadapt seems to be reasonably up-to-date. Setting -j 4

@ghost
Copy link

ghost commented Jun 9, 2020

The former, stetting the cpus = 2 seemed to fix the issue for me.

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

@ewels seems to have been right about it: nf-core/atacseq#65 (comment)

@FelixKrueger what would be the way to do it right then?

nf-core/atacseq#65 (comment)

This is the current state in Sarek too:

nf-core/atacseq#65 (comment)

@FelixKrueger
Copy link

I think in the end we agreed that this was the right way to handle it:

nf-core/atacseq#65 (comment)

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

Yeah after giving this a go again, I've found a hint that might be a reason - both logfiles above complain about having a multi-core analysis started (see above, ERROR: Running in parallel is not supported on Python 2) which according to the logic isn't possible with Python2.

I checked the build logs of nf-core/sarek:2.6 and it seems to have installed python2 inside the environment, so this can fail. The tests don't cover this as they are always started with fewer CPUs (n=2). Not entirely sure this is the case, but that would explain the behaviour above (python 2 but requesting multi-core, thus the failure) and the success of @mGauder when limiting to 2 cores only.

I take back my comment from above, that seems to be more of a problem between environment in the container <-> cutadapt requiring python>3 for multicore :-)

@drpatelh
Copy link
Member

drpatelh commented Jun 9, 2020

Yep. Sorry for arriving late to the party but I believe you need Python 3 as well as pigz for this to work properly.

@drpatelh
Copy link
Member

drpatelh commented Jun 9, 2020

See here. I reviewed the software addition here and assumed you would be upgrading to Python 3 @maxulysse 😅

@maxulysse
Copy link
Member

I'll see if we can do that, if I remember well, I had some tools that required python 2 when I last checked, that's why python 2 is installed and not python 3...

@ewels
Copy link
Member

ewels commented Jun 9, 2020

MultiQC will start breaking on Python 2 before long.. Officially unsupported from the v1.9 release.

@maxulysse
Copy link
Member

I was hoping to solve such issues with #132

@maxulysse
Copy link
Member

ok, so not possible to update python due to Manta and Strelka recipes that require python 2.7.
Illumina/manta#180
Illumina/strelka#156
I'm afraid I can't solve this error currently without using multiples containers.
I'll sort it out with #132 as initially planned.

@FelixKrueger
Copy link

Maybe I should quickly add here that Trim Galore (and Cutadapt) also work with the - now obsolete - Python 2. Maybe some checks on the Nextflow side could be added?

But yeah, overall I would certainly argue in favour of Python3, and more importantly: DSL2!

@maxulysse
Copy link
Member

Maybe I should quickly add here that Trim Galore (and Cutadapt) also work with the - now obsolete - Python 2. Maybe some checks on the Nextflow side could be added?

But yeah, overall I would certainly argue in favour of Python3, and more importantly: DSL2!

I already assumed they worked with Python 2, since this is what we currently have in the container ;-)

@maxulysse maxulysse added the DSL2 label Jul 23, 2020
@maxulysse maxulysse added this to the 3.0 milestone Aug 31, 2020
@alirizaaribas-ibg
Copy link

No problem at all on my side.
Trimgalore process succeeded without issues.
I definitively can't reproduce this bug.

@nf-core/core could it be a collision with the python on the system?

How can I solve this collision? I checked all versions with the command "which python{x} and found that nf-core environment provides python3, but trim galore achieves to run under system python which is python2.

@maxulysse
Copy link
Member

I'm afraid we can't fix that until we finish the switch to DSL2

@FriederikeHanssen
Copy link
Contributor

This should be fixe dnow on the dev branch

@FriederikeHanssen
Copy link
Contributor

Will close this now as it has been fixed in the dev branch with the newest trimgalore version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants