Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design file for functionality for differential expression analysis, multiple species #123

Closed
drpatelh opened this issue Nov 22, 2018 · 7 comments

Comments

@drpatelh
Copy link
Member

@apeltzer

If we are able to provide some sort of experiment design file to the pipeline then it will be relatively straightforward to perform the differential analysis on the counts.

For example the atacseq pipeline uses the R script below with a matrix of counts as input:
https://github.com/nf-core/atacseq/blob/dev/bin/featurecounts_deseq2.r

Pipeline should be able to run without this feature for backward compatability? @ewels

@apeltzer
Copy link
Member

I think making this available as an optional step would be really nice - especially if we can generate these anyways automatically in e.g. a tsv/csv table format 👍

@ewels
Copy link
Member

ewels commented Dec 17, 2018

Yes, this would be great! Needs some thought about how to refactor the input channels whilst retaining backwards compatibility, but hopefully shouldn't be too tricky.

@olgabot
Copy link
Contributor

olgabot commented May 29, 2019

For reference, here's an implementation of input sequences that can take SRA, **{R1,R2}*.fastq.gz, a csv file, or fastas: https://github.com/czbiohub/nf-kmer-similarity/blob/master/main.nf#L80

@olgabot olgabot changed the title Add functionality for differential expression analysis Add design file for functionality for differential expression analysis, multiple species Aug 23, 2019
@olgabot
Copy link
Contributor

olgabot commented Aug 23, 2019

(updated title to include "design file" for easy searching and added species for my own suggestions :)

Is there any interest in multispecies support for rnaseq? e.g. for PRJNA143627 (https://ewels.github.io/sra-explorer/#) there’s 9 species and it would be really awesome to give like a tab-delimited (no csv since commas are in the R1,R2 definition) file that said reads,genome and it would align all samples. even globs would be good, e.g.

reads                                genome     singleEnd
s3://data/human/**{R1,R2}*fastq.gz    GRCh38    false
s3://data/mouse/**.fastq.gz           GRCm38    true

I'm about to run 18 separate nf-core/rnaseq runs for 9 species x both single end and paired end so this pain is quite real for me right now :) Plus, having all species in one multiqc report would be super awesome!

@ewels
Copy link
Member

ewels commented Aug 24, 2019

My suspicion is that support for multiple genomes per run will add quite a lot of complexity, and that your use case is relatively rare 😉 If your work folders are kicking around still then it's pretty easy to re-run MultiQC on the different MultiQC workdirs from each run. This is what I've done in the past when I've had similar situations.

@drpatelh
Copy link
Member Author

Goodness this one 🙈

@drpatelh
Copy link
Member Author

Initial functionality for samplesheet input added in #459. Input format will be:

group,replicate,fastq_1,fastq_2

which should be enough information to perform a basic differential analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants