Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference params set before iGenomes config is loaded #121

Closed
efratushava opened this issue Nov 25, 2019 · 18 comments
Closed

Reference params set before iGenomes config is loaded #121

efratushava opened this issue Nov 25, 2019 · 18 comments
Labels
bug Something isn't working high-priority

Comments

@efratushava
Copy link

I think in the nextflow.config file, in line
bwa_meth_index = params.genome ? params.genomes[ params.genome ].bwa_meth ?: false : false

Doesn't it have to be
bwa_meth_index = params.genome ? params.genomes[ params.genome ].bwa ?: false : false

Because in the igenome config, there is no field name bwa_meth, and there is a field bwa

@ewels ewels added the bug Something isn't working label Nov 25, 2019
@ewels
Copy link
Member

ewels commented Nov 25, 2019

Hi @efratushava,

Good spot, I'm actually not sure.. I've never personally used the bwameth pipeline with Human data / iGenomes I think.

Ok, looking at https://github.com/brentp/bwa-meth#index - I think that the indices for bwameth are not the same as bwa. Running bwameth.py index does call the regular bwa index command, but it's done with a Fasta file that is converted to a 3-letter code.

So if we try to use regular bwa iGenomes indices for this analysis, it will fail. Or worse, work but give really terrible results.

Ideally, we should add bwa_meth indices to iGenomes for all species. But in the short term, it's possible that people will create their own config files with different species using this notation. So I'm happy to leave the code as it is I think.

Does this make sense? Let me know if you have any suggestions for code / documentation that could be improved.

Phil

@ewels ewels closed this as completed Nov 25, 2019
@ewels ewels removed the bug Something isn't working label Nov 25, 2019
@efratushava
Copy link
Author

Maybe I was not clear.
The iGenome config file doesn't contin a bwa_meth field, so when running the pipeline with the bwameth aligner, it's looking tor the params.bwa_meth_index which will always be false because params.genomes[ params.genome ].bwa_meth does not exist.

So, now, in the iGenome config, the bwa field contains a regular bwa index?

@IsmailM
Copy link

IsmailM commented Nov 25, 2019

Hi Phil,

I have run into the same issue (when running the bismark aligner option)

Perhaps, the config code could be updated to check for aligner type before attempting to set the bwa_meth_index value (and look for the bwa_meth key that does not exist by default)?

Moreover, there is no fasta_index key in the igenomes config. Should we also set this to false.

@ewels
Copy link
Member

ewels commented Nov 25, 2019

Hi both,

These config keys are not unique to iGenomes. We use the same configuration structure in our custom configs for home-made genome indices. This is why I think it's useful to have that code, even if these keys are not available in the iGenomes config.

I was under the impression that if your current pipeline config doesn't have a key called bwa_meth then the channel will be set to false and the index will be built:

bwa_meth_index = params.genome ? params.genomes[ params.genome ].bwa_meth ?: false : false

So if params.genomes[ params.genome ].bwa_meth evaluates to false (such as when it's not set), then bwa_meth_index will be set to false and the index will be built. Same for Bismark and fasta_index.

Am I wrong about this - is the pipeline crashing for you?

So, now, in the iGenome config, the bwa field contains a regular bwa index?

Yes - used by other pipelines for regular genome alignments. A recent template change added all iGenomes index paths to all pipelines (previously these lines were absent from the pipeline).

Phil

@IsmailM
Copy link

IsmailM commented Nov 25, 2019

I initially saw the following error:

$ nextflow run nf-core/methylseq -profile docker -resume -c bismark.config --genome 'GRCh37' 
N E X T F L O W  ~  version 19.10.0
Launching `nf-core/methylseq` [grave_almeida] - revision: 40760ecffb [master]
No reference genome index or fasta file specified. Expression: (params.bismark_index || params.fasta)

-- Check script '/home/ucbtmog/.nextflow/assets/nf-core/methylseq/main.nf' at line: 109 or see '.nextflow.log' file for more details

Looking back at this, this seems to be less about the bwa_meth / fasta_index keys not being present in the key, but rather bismark_index and fasta not being set automatically.

Setting the following manually in my bismark.config fixes the issue:

params {
  ...
  bismark_index = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
  fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
  ...
}

@efratushava
Copy link
Author

I also had this problem, issue #122

@ewels
Copy link
Member

ewels commented Nov 25, 2019

Ok, getting somewhere.. @IsmailM - this should definitely work. You're using Bismark with a standard iGenomes reference. GRCh37 has both a fasta and Bismark ref specified.

Looking at the code a little more closely, I think this is because these values are being set in nextflow.config lines 16-19 and the iGenomes config is being loaded later in the same file, lines 100-103. So there are no iGenomes refs when the variable is set. @efratushava - your issue in #122 is due to the same problem I think.

In other pipelines, such as the RNAseq pipeline, those reference params are set in main.nf instead for exactly this reason.

This used to be the case for the methylseq pipeline too, but it looks like they were moved by @phue in the massive template sync in #92 - so this change would have come out in version 1.4 of the pipeline and broken stuff. Which makes sense.

The fix should be to either use {} to defer the execution of those config lines, or better still to just move them back to main.nf. I think this is the best approach. It should be pretty easy and I think that this is a serious enough problem to warrant a patch release.

@phue - what do you think?

Phil

@ewels ewels reopened this Nov 25, 2019
@ewels ewels changed the title Change nextflow config-bug report Reference params set before iGenomes config is loaded Nov 25, 2019
@ewels ewels added bug Something isn't working high-priority labels Nov 25, 2019
@ewels
Copy link
Member

ewels commented Nov 25, 2019

Prepping a PR now..

@ewels
Copy link
Member

ewels commented Nov 25, 2019

I think that this is a serious enough problem to warrant a patch release.

Actually, given that we have merged a couple of PRs, I propose we do a v1.4.1 minor release.

@ewels ewels mentioned this issue Nov 25, 2019
ewels added a commit that referenced this issue Nov 26, 2019
@efratushava
Copy link
Author

still not working...

~% nextflow -log methylseq_try run nf-core/methylseq -with-singularity /tmp/ekushele-singularity/biscuit.img --aligner 'bismark' --reads 'data/fastq_files/*/*sorted_{1,2}.fastq' --genome 'GRCh37' --outdir 'data/nextflow'  
N E X T F L O W  ~  version 19.10.0
Launching `nf-core/methylseq` [shrivelled_golick] - revision: 40760ecffb [master]
Unknown config attribute `params.genomes.GRCh37.bwa_meth` -- check config file: /cs/icore/ekushele/.nextflow/assets/nf-core/methylseq/nextflow.config

What's up with this?

@ewels
Copy link
Member

ewels commented Dec 4, 2019

It is not yet released, try with -r dev

@efratushava
Copy link
Author

You for sure noticed, but I'm just emphasizing-the parameters in the main.nf file should start with params.fasta, params.fasta_index etc... (instead of fasta, fasta_index)

But somehow-it's still not working! The pipeline can't run on my files, which exist for sure!
The script made a link from the files I specified to the working directory, but the fastqc interupts everything, showing the error
Skipping '149122-cf-M.ben_list.b37.sorted_1.fastq' which didn't exist, or couldn't be read.
And the file exists!!!

@ewels
Copy link
Member

ewels commented Dec 4, 2019

Please link to specific lines in the code (permalink with commit hash) if there are bits that you think are wrong.

Have you pulled the latest version of the dev branch? nextflow pull

@ewels
Copy link
Member

ewels commented Dec 4, 2019

Though the file not existing sounds like it could be a similar problem to the MultiQC issue. I suspect that these are not pipeline specific but relate to containers / Nextflow interaction with the underlying compute infrastructure...

@efratushava
Copy link
Author

Please link to specific lines in the code (permalink with commit hash) if there are bits that you think are wrong.

lines 100-103 in main.nf, the lines that were added in the dev version

Have you pulled the latest version of the dev branch? nextflow pull

yes, with git clone -b dev

@ewels
Copy link
Member

ewels commented Dec 4, 2019

methylseq/main.nf

Lines 100 to 103 in d0dffea

bismark_index = params.genome ? params.genomes[ params.genome ].bismark ?: false : false
bwa_meth_index = params.genome ? params.genomes[ params.genome ].bwa_meth ?: false : false
fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false
fasta_index = params.genome ? params.genomes[ params.genome ].fasta_index ?: false : false

@ewels
Copy link
Member

ewels commented Dec 10, 2019

This should now be fixed in dev and will be released in 1.4.1 as a minor release.

Thanks all for spotting the error and helping with the fix 👍

@ewels ewels closed this as completed Dec 10, 2019
@ewels
Copy link
Member

ewels commented Apr 9, 2020

Hi all,

It took a long time sorry, but we finally just released v1.5 of the pipeline, which includes these fixes.

Apologies for the delay,

Phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority
Projects
None yet
Development

No branches or pull requests

3 participants