fix readqc pipeline.ini fastq_screen databases #422

amayer21 · 2018-04-13T18:11:32Z

In the fastq_screen parameters of the readqc pipeline.ini, the databases are all active by default. The comment says to " Put a '#' in front of organisms you are not interested in" but that doesn't work as the entry still exists without '#' in the original pipeline.ini. So it will be loaded into the PARAMS. Therefore even if the user add '#' in front of a database in the pipeline.ini file in the working directory, reads will still be aligned to that genome.

I propose to add a '#' to all databases in the original pipeline.ini and ask the user to remove the '#' in front of the databases he is interested in.

Acribbs · 2018-04-16T11:36:59Z

hmm, Im not really sure what you mean by this commit. Maybe we can catch up at lunch and discuss.

amayer21 · 2018-04-16T13:56:32Z

Will try to be more explicit:
In each pipeline, parameters are retrieved from 3 pipeline.ini files: a central one, then the pipeline.ini in subfolder of pipeline script, then pipeline.ini in working directory. Each pipeline.ini overwrite parameters defined in previous one.

For fastq_screen database parameters (readqc pipeline), the pipeline.ini in script subfolder contains paths to bowtie indexes of different organism (human, mouse, rat, yeast, phix, bacteria and contaminants), so at that stage all path to databases are loaded into the parameters.

If I take the example of the mouse database:
In pipeline.ini in script subfolder, you'll find:
database_mouse=/ifs/mirror/genomes/bowtie/mm9
If I don't want to check for mouse contaminants in my reads, I'm told to add '#' in front of it in pipeline.ini in the working directory:
#database_mouse=/ifs/mirror/genomes/bowtie/mm9
but the parameter "fastq_screen_database_mouse" already exist, so even if I add the '#', my reads will still be aligned to mouse genome (which can be ressource consuming, even if fastq_screen sample only 100,000 reads).

I propose to add '#' to all databases in the pipeline.ini file in the pipeline script subfolder. Then the user can remove it from pipeline.ini in pipeline.ini in working directory if he wants to check for contaminants from chosen organisms.

Happy to discuss at the break if not clear (sorry I've been busy viewing posters at lunch)

Acribbs · 2018-04-16T14:06:17Z

Ah, ok now I get the issue now. The solution you propose sounds sensible.

sebastian-luna-valero · 2018-04-24T10:37:27Z

Thanks, Alice.

Commenting out the databases for fastq_screen causes the pipeline to fail with the following error message:

    Using fastq_screen v0.11.3
    Reading configuration from './ctmpisazwypq/fastq_screen.conf'
    Aligner (--aligner) not specified. Did not find Bowtie/Bowtie2/BWA paths and/or index files
    Please check: you have provided the full path to the aligner INCLUDING the executable filename
    Please check: the specified genome indices comprises the full path AND the basename of the index files
    See documentation for further details
    Please adjust configuration.

Provided that there have been previous requests to make fastq_screen optional (please see #156 and #385) I will then disable it by default so it is up to the user to enable and configure the databases properly.

amayer21 · 2018-04-24T11:10:55Z

Thanks Sebastian.
I get the same error message as you if I run it on CBRG server with pipeline.ini as it was, i.e. even without the databases being commented out, because the paths provided are all on CGAT server (as described in #385). Making it optional seems sensible.

fix readqc pipeline.ini fastq_screen databases

0312273

disable fastq_screen by default

59c56b6

sebastian-luna-valero merged commit 75adf24 into master Apr 24, 2018

sebastian-luna-valero deleted the AM-fixing-readqc-pipelineini branch April 24, 2018 10:46

sebastian-luna-valero added a commit to cgat-developers/cgat-flow that referenced this pull request May 31, 2018

migrating CGATOxford/CGATPipelines#422

cca4b1e

sebastian-luna-valero mentioned this pull request May 31, 2018

Disable fastq_screen in pipeline_readqc by default cgat-developers/cgat-flow#23

Merged

sebastian-luna-valero added a commit to cgat-developers/cgat-flow that referenced this pull request May 31, 2018

migrating CGATOxford/CGATPipelines#422 (#23)

c0d0f42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix readqc pipeline.ini fastq_screen databases #422

fix readqc pipeline.ini fastq_screen databases #422

amayer21 commented Apr 13, 2018

Acribbs commented Apr 16, 2018

amayer21 commented Apr 16, 2018

Acribbs commented Apr 16, 2018

sebastian-luna-valero commented Apr 24, 2018

amayer21 commented Apr 24, 2018

fix readqc pipeline.ini fastq_screen databases #422

fix readqc pipeline.ini fastq_screen databases #422

Conversation

amayer21 commented Apr 13, 2018

Acribbs commented Apr 16, 2018

amayer21 commented Apr 16, 2018

Acribbs commented Apr 16, 2018

sebastian-luna-valero commented Apr 24, 2018

amayer21 commented Apr 24, 2018