Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FastQ-Screen database multiplexing #53

Merged
merged 46 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
b2e7be5
Fran's draft
kedhammar Oct 28, 2024
d833d91
nf-core modules update --> fastqscreen
kedhammar Oct 28, 2024
285a4b3
add fastq_screen.conf
kedhammar Oct 28, 2024
a081563
hack out some ideas
edmundmiller Oct 29, 2024
7a4dccd
test: Update path for database
edmundmiller Oct 29, 2024
f0b23a7
Hack ideas
edmundmiller Oct 29, 2024
3ea8d8b
Initial datasheet example
edmundmiller Oct 29, 2024
fea850f
refactor: Rework fastqscreen with new ideas
edmundmiller Oct 29, 2024
600c2fb
chore: Bump fastq-screen
edmundmiller Oct 29, 2024
89e67fa
fix: Give up on fastq_screen writing it's own config
edmundmiller Oct 29, 2024
a6565f7
test: Update databases
edmundmiller Oct 29, 2024
4af6dd4
fix: Run every sample and every DB
edmundmiller Oct 29, 2024
46805e8
fix: Give up on fastq_screen writing it's own config
edmundmiller Oct 29, 2024
4001a50
test: Update databases
edmundmiller Oct 29, 2024
7ebb6c1
fix: Run every sample and every DB
edmundmiller Oct 29, 2024
e560181
adding fastqscreen database parameter, updating docs
FranBonath Oct 30, 2024
ac7bb3f
Merge branch 'fastqscreen' into fastqscreen
FranBonath Oct 30, 2024
effa9c9
remove absolute path of fastqscreen config
FranBonath Nov 7, 2024
a2f2dfc
remove "projectDir" from path for fastqscreen config in nextflow.config
FranBonath Nov 7, 2024
c213e90
Merge pull request #64 from FranBonath/fastqscreen
FranBonath Nov 7, 2024
b1ab75b
Merge branch 'dev' into fastqscreen
kedhammar Nov 7, 2024
2c44320
remove conf/fastq_screen.conf and it's references, as the file is bui…
kedhammar Nov 7, 2024
fedeab2
remove diff dump, I don't see why it should be here
kedhammar Nov 7, 2024
588ad1d
spacing
kedhammar Nov 7, 2024
2001ccd
Revert remove diff dump, now I see why it should be here :)
kedhammar Nov 7, 2024
5bb8dfd
fix faulty params var name in test config
kedhammar Nov 7, 2024
3baa7fc
update snapshots for new multiqc citations
kedhammar Nov 7, 2024
843e2c6
module tests of fastqscreen are dependent on non-installed modules, s…
kedhammar Nov 7, 2024
66d08ad
Append changelog entry
kedhammar Nov 7, 2024
9567f47
Check for fastQ Screen files in multiqc report in pipeline tests and …
kedhammar Nov 7, 2024
5248fad
fix indent
kedhammar Nov 7, 2024
e2a290c
fix: Rename fastqscreen files so they're unique
edmundmiller Nov 14, 2024
a0ef751
Merge branch 'dev' into fastqscreen
edmundmiller Nov 14, 2024
46d1bfd
fix: collectFile fastqscreen rough draft
edmundmiller Nov 14, 2024
b8c363c
chore: Lock down arity in fastqscreen
edmundmiller Nov 14, 2024
d021ed0
feat: Getting close to making the config on the head node
edmundmiller Nov 14, 2024
4b278bf
make functional :')
kedhammar Dec 5, 2024
63ebd67
make input mounting more flexible by separating mounted dir path from…
kedhammar Dec 5, 2024
2608088
successfully use schema validation for fastqscreen reference .csv
kedhammar Dec 5, 2024
4f743e2
remove redundant operator
kedhammar Dec 5, 2024
5aeb8d1
update comment
kedhammar Dec 5, 2024
a31a392
switch to cleaner file and variable names, remove fastqscreen .csv as…
kedhammar Dec 5, 2024
7396d4c
update docs
kedhammar Dec 5, 2024
e3a8862
update snapshots
kedhammar Dec 5, 2024
586ac9b
re-generate module patch
kedhammar Dec 5, 2024
7b6c18b
try gha update
kedhammar Dec 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions .github/workflows/nf-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,13 @@ on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up JDK 11
uses: actions/setup-java@v2
with:
java-version: "11"
distribution: "adopt"

- name: Setup Nextflow latest-edge
uses: nf-core/setup-nextflow@v1
uses: nf-core/setup-nextflow@v2
with:
version: "latest-edge"

Expand All @@ -28,5 +20,5 @@ jobs:
wget -qO- https://get.nf-test.com | bash
sudo mv nf-test /usr/local/bin/

- name: Run Tests (Shard ${{ matrix.shard }}/${{ strategy.job-total }})
run: nf-test test --ci --shard ${{ matrix.shard }}/${{ strategy.job-total }} .
- name: Run Tests
run: nf-test test --ci tests
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
- [#53](https://github.com/nf-core/seqinspector/pull/53) Add FastQ-Screen database multiplexing and limit scope of nf-test in CI.

### `Fixed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@

> Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. doi.org/10.3390/bioengineering8050059

- [FastQ Screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)

> Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 2; referees: 4 approved]. F1000Research 2018, 7:1338 (https://doi.org/10.12688/f1000research.15931.2)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
4 changes: 4 additions & 0 deletions assets/example_fastq_screen_references.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name,dir,basename,aligner
Ecoli,s3://ngi-igenomes/igenomes/Escherichia_coli_K_12_MG1655/NCBI/2001-10-15/Sequence/Bowtie2Index/,genome,bowtie2
Aratz marked this conversation as resolved.
Show resolved Hide resolved
PhiX,s3://ngi-igenomes/igenomes/PhiX/Illumina/RTA/Sequence/Bowtie2Index/,genome,bowtie2
Scerevisiae,s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/,genome,bowtie2
35 changes: 35 additions & 0 deletions assets/schema_fastq_screen_references.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-core/seqinspector/master/assets/schema_fastq_screen_references.json",
"title": "nf-core/seqinspector pipeline - params.fastq_screen_references schema",
"description": "Schema for the file provided with params.fastq_screen_references",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "The reference name as referred to by FastQ Screen."
},
"dir": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+$",
"errorMessage": "Path to the dir containing the aligner reference and index. Can be remote."
},
"basename": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "The shared basename of the reference and index files contained in the dir."
},
"aligner": {
"type": "string",
"enum": ["bowtie", "bowtie2", "bwa", "minimap2"],
"errorMessage": "Specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' (default), 'bwa' or 'minimap2'."
}
},
"required": ["name", "dir", "basename", "aligner"]
}
}
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ params {
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'seqinspector/testdata/NovaSeq6000/samplesheet.csv'
fastq_screen_references = "${projectDir}/assets/example_fastq_screen_references.csv"

// Genome references
genome = 'R64-1-1'
Expand Down
28 changes: 27 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
- [FastQC](#fastqc) - Raw read QC
- [SeqFu Stats](#seqfu_stats) - Statistics for FASTA or FASTQ files
- [Fastqscreen](#fastqscreen) - mapping against a set of references for basic contamination QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -41,7 +42,32 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

#### SeqFu Stats
### FASTQSCREEN

<details markdown="1">
<summary>Output files</summary>

- `fastqscreen/`
- `*_screen.html`: Interactive graphical fastqscreen report which summaries the mapping of your sequences against each of your libraries.
- `*_screen.pdf`: Static graphical fastqscreen report which summaries the mapping of your sequences against each of your libraries.
- `*_screen.txt` : text based fastqscreen report which summaries the mapping of your sequences against each of your libraries.

</details>

[Fastqscreen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) allows you to set up a standard set of libraries against which all of your sequences can be searched. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments.

It requires a `.csv` detailing:

- the working name of the reference
- the name of the aligner used to generate its index (which is also the aligner and index used by the tool)
- the file basename of the reference and its index (e.g. the reference `genoma.fa` and its index `genome.bt2` have the basename `genome`)
- the path to a dir where the reference and index files both reside.

See `assets/example_fastq_screen_references.csv` for example.

The `.csv` is provided as a pipeline parameter `fastq_screen_references`. The `.csv` is used to construct a `FastQ Screen` configuration file within the context of the process work directory in order to properly mount the references.

### SeqFu Stats

<details markdown="1">
<summary>Output files</summary>
Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"fastqscreen/fastqscreen": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"],
"patch": "modules/nf-core/fastqscreen/fastqscreen/fastqscreen-fastqscreen.diff"
},
"multiqc": {
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
Expand Down
14 changes: 14 additions & 0 deletions modules/nf-core/fastqscreen/fastqscreen/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 58 additions & 0 deletions modules/nf-core/fastqscreen/fastqscreen/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading