Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix readqc pipeline.ini fastq_screen databases #422

Merged
merged 2 commits into from
Apr 24, 2018

Conversation

amayer21
Copy link
Contributor

In the fastq_screen parameters of the readqc pipeline.ini, the databases are all active by default. The comment says to " Put a '#' in front of organisms you are not interested in" but that doesn't work as the entry still exists without '#' in the original pipeline.ini. So it will be loaded into the PARAMS. Therefore even if the user add '#' in front of a database in the pipeline.ini file in the working directory, reads will still be aligned to that genome.

I propose to add a '#' to all databases in the original pipeline.ini and ask the user to remove the '#' in front of the databases he is interested in.

@Acribbs
Copy link
Member

Acribbs commented Apr 16, 2018

hmm, Im not really sure what you mean by this commit. Maybe we can catch up at lunch and discuss.

@amayer21
Copy link
Contributor Author

Will try to be more explicit:
In each pipeline, parameters are retrieved from 3 pipeline.ini files: a central one, then the pipeline.ini in subfolder of pipeline script, then pipeline.ini in working directory. Each pipeline.ini overwrite parameters defined in previous one.

For fastq_screen database parameters (readqc pipeline), the pipeline.ini in script subfolder contains paths to bowtie indexes of different organism (human, mouse, rat, yeast, phix, bacteria and contaminants), so at that stage all path to databases are loaded into the parameters.

If I take the example of the mouse database:
In pipeline.ini in script subfolder, you'll find:
database_mouse=/ifs/mirror/genomes/bowtie/mm9
If I don't want to check for mouse contaminants in my reads, I'm told to add '#' in front of it in pipeline.ini in the working directory:
#database_mouse=/ifs/mirror/genomes/bowtie/mm9
but the parameter "fastq_screen_database_mouse" already exist, so even if I add the '#', my reads will still be aligned to mouse genome (which can be ressource consuming, even if fastq_screen sample only 100,000 reads).

I propose to add '#' to all databases in the pipeline.ini file in the pipeline script subfolder. Then the user can remove it from pipeline.ini in pipeline.ini in working directory if he wants to check for contaminants from chosen organisms.

Happy to discuss at the break if not clear (sorry I've been busy viewing posters at lunch)

@Acribbs
Copy link
Member

Acribbs commented Apr 16, 2018

Ah, ok now I get the issue now. The solution you propose sounds sensible.

@sebastian-luna-valero
Copy link
Member

Thanks, Alice.

Commenting out the databases for fastq_screen causes the pipeline to fail with the following error message:

    Using fastq_screen v0.11.3
    Reading configuration from './ctmpisazwypq/fastq_screen.conf'
    Aligner (--aligner) not specified. Did not find Bowtie/Bowtie2/BWA paths and/or index files
    Please check: you have provided the full path to the aligner INCLUDING the executable filename
    Please check: the specified genome indices comprises the full path AND the basename of the index files
    See documentation for further details
    Please adjust configuration.

Provided that there have been previous requests to make fastq_screen optional (please see #156 and #385) I will then disable it by default so it is up to the user to enable and configure the databases properly.

@sebastian-luna-valero sebastian-luna-valero merged commit 75adf24 into master Apr 24, 2018
@sebastian-luna-valero sebastian-luna-valero deleted the AM-fixing-readqc-pipelineini branch April 24, 2018 10:46
@amayer21
Copy link
Contributor Author

Thanks Sebastian.
I get the same error message as you if I run it on CBRG server with pipeline.ini as it was, i.e. even without the databases being commented out, because the paths provided are all on CGAT server (as described in #385). Making it optional seems sensible.

sebastian-luna-valero added a commit to cgat-developers/cgat-flow that referenced this pull request May 31, 2018
sebastian-luna-valero added a commit to cgat-developers/cgat-flow that referenced this pull request May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants