-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FastQ-Screen database multiplexing #53
Conversation
|
b44fc07
to
4af6dd4
Compare
4af6dd4
to
7ebb6c1
Compare
I am currently working on this to update the docs and get the missing param in. |
Rough summary of current status after some investigating by me and @FranBonath
We think we want the output files of the process to contain the names of both the sample and reference used to generate them, and make sure they all end up in the publishdir. NOTE |
Okay I figured out a way around this. Works pretty well with MultiQC. Probably going to want to use https://seqera.io/blog/multiqc-grouped-samples/
Also got this for free, but IMO I think publishing them should just be skipped, if you're just going to use the results inside of MultiQC.
May I suggest, an array job? I think that would make your HPC admins even happier. https://www.nextflow.io/docs/latest/reference/process.html |
Ah okay looking at the expected fastqscreen data now https://github.com/MultiQC/test-data/blob/main/data/modules/fastq_screen/v0.14.0/scRNAseq_HISAT_example1_screen.txt It's probably easier to handle all of the databases in one run per sample. So two options:
|
@edmundmiller thanks for taking the time to wrestle with this! I've also spent a fair bit of time on it and am unfortunately equally stumped. I prefer the solution in which we put the tool to its intended use-case of mapping a single sample to multiple references simultaneously since we then get the appropriate outputs for MultiQC for free. And an appropriate degree of parallelization imo. I have a functional example that runs and illustrates the kind of solution I'd like process TEST {
input:
tuple val(db_name), path(db_path, name: "db_path*"), val(aligner)
script:
"""
echo "DATABASE ${db_name} ./${db_path}/genome ${aligner}" >> fastq_screen.conf
"""
}
workflow {
ch_db = Channel
.fromList([
["Ecoli", "s3://ngi-igenomes/igenomes/Escherichia_coli_K_12_MG1655/NCBI/2001-10-15/Sequence/Bowtie2Index/", "bowtie2"],
])
.collect()
.view()
TEST(ch_db)
} but I can't make it work for more than one reference. I was advised by Phil to make a post on the Seqera community |
As of commit 4b278bf, the pipeline can be run in test profile using FastQ Screen as intended, at least for me on GitPod 👀 We use a .csv listing the names, paths and aligners of our references and feed it into the process to build the FastQ Screen config within the context of the work directory, using the mounted input files. Still need to
I had to change the way the versions.yaml was written, I got weird errors from the here-file approach that I couldn't get to the bottom of. |
… basename of reference / index files therein
… main workflow input
Looks like the nf-test CI is failing, but I'm fairly confident it's unrelated to this PR. I've asked in the nf-test channel on Slack now. In the meantime, this branch may finally be ready for review 😎 |
CI now patched 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
That's a nice collaborative PR.
I'd really recommend to use the nft-utils plugins for the pipeline level tests.
PR checklist
[ ] If necessary, also make a PR on the nf-core/seqinspector branch on the nf-core/test-datasets repository.nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).