Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #418 (iGenomes GRCm38 paths) #419

Merged
merged 1 commit into from
Oct 10, 2019
Merged

Conversation

apeltzer
Copy link
Member

This should fix the issue @drpatelh found!

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks. Im going to submit another PR adding in some of the UCSC genomes too 👍 and other standard fields used in the chipseq and atacseq pipelines like macs_gsize. Do you want to add anything else?

@drpatelh drpatelh merged commit 2358cdb into nf-core:dev Oct 10, 2019
@apeltzer
Copy link
Member Author

No, I think that makes sense - we should probably have almost everything in there that is existing in iGenomes at some point as many pipelines will use that stuff?

@apeltzer apeltzer deleted the fix-igenomes branch October 10, 2019 10:48
@drpatelh
Copy link
Member

The only other thing I can see that would be worth adding is the BismarkIndex that is used in methylseq. Ill add that in too:
https://github.com/nf-core/methylseq/blob/master/conf/igenomes.config

Will have to check all of the paths exists after Ive added everything in but it looks pretty well organised and consistent.

@ewels
Copy link
Member

ewels commented Oct 14, 2019

I don’t think that we need to add the Bismark refs to the template. I’d be surprised if any pipelines other than the methylseq pipeline need them, so it’s just something extra to clean up for each new pipeline.

GTF / Fasta etc is more fair game :)

@drpatelh
Copy link
Member

Morning! I've gone through all of the organisms that were in the current igenomes.config and added all of these files already. I've also added the latest version of the UCSC genomes and checked that the file paths exist against the AWS iGenomes listing. I guess if you don't use the parameter for a specific index in the pipeline itself then it doesn't matter? Probably easier to have these paths in as a full listing then to try and add/remove them later?
https://github.com/nf-core/chipseq/blob/dev/conf/igenomes.config

The great thing is that I can just copy the file for any of the pipelines using iGenomes AWS now. Couple of things that need to be resolved:

  • Adding blacklists to AWS iGenomes (:sweat_smile:)
  • For bwa we refer to the full path to the index i.e. index_dir/genome.fa because it gives you more flexibility with genome indices that are named differently. However, for all of the other indices we just refer to index_dir. Haven't explicitly checked how other pipelines are using the latter but may be good to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants