Merge pull request nf-core#597 from nf-core/update-docs

Change lane type from number to string
genomic-medicine-sweden · Aug 14, 2024 · 5ce1f18 · 5ce1f18
2 parents ad98b97 + f99aa9e
commit 5ce1f18
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 15 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -19,7 +19,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Changed`
 
-- Allow `0` as a valid value for `sex` in the samplesheet [#595](https://github.com/nf-core/raredisease/pull/587)
+- Acceptable type for lane field in the samplesheet from number to string [#597](https://github.com/nf-core/raredisease/pull/597)
+- Allow `0` as a valid value for `sex` in the samplesheet [#595](https://github.com/nf-core/raredisease/pull/595)
 - Updated deepvariant to version 1.6.1 [#587](https://github.com/nf-core/raredisease/pull/587)
 - Parallelized vcfanno [#585](https://github.com/nf-core/raredisease/pull/585)
 - Skip ROH calling with bcftools if there are no affected samples [#579](https://github.com/nf-core/raredisease/pull/579)

diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -15,8 +15,9 @@
                 "errorMessage": "Sample name must be provided and cannot contain spaces"
             },
             "lane": {
-                "type": "number",
-                "meta": ["lane"]
+                "type": "string",
+                "meta": ["lane"],
+                "pattern": "^\\S+$"
             },
             "fastq_1": {
                 "type": "string",

diff --git a/docs/usage.md b/docs/usage.md
@@ -102,17 +102,17 @@ A samplesheet is used to pass the information about the sample(s), such as the p
 
 nf-core/raredisease will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The pedigree information in the samplesheet (sex and phenotype) should be provided as they would be for a [ped file](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format) (i.e. 1 for male, 2 for female, other for unknown).
 
-| Fields        | Description                                                                                                                                                                            |
-| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `sample`      | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
-| `lane`        | Used to generate separate channels during the alignment step.                                                                                                                          |
-| `fastq_1`     | Absolute path to FASTQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                         |
-| `fastq_2`     | Absolute path to FASTQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                         |
-| `sex`         | Sex (1=male; 2=female; for unknown sex use 0 or other).                                                                                                                                |
-| `phenotype`   | Affected status of patient (0 = missing; 1=unaffected; 2=affected).                                                                                                                    |
-| `paternal_id` | Sample ID of the father, can be blank if the father isn't part of the analysis or for samples other than the proband.                                                                  |
-| `maternal_id` | Sample ID of the mother, can be blank if the mother isn't part of the analysis or for samples other than the proband.                                                                  |
-| `case_id`     | Case ID, for the analysis used when generating a family VCF.                                                                                                                           |
+| Fields        | Description                                                                                                                                                                                             |
+| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `sample`      | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`).                  |
+| `lane`        | Used to generate separate channels during the alignment step. It is of string type, and we recommend using a combination of flowcell and lane to distinguish between different runs of the same sample. |
+| `fastq_1`     | Absolute path to FASTQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                                          |
+| `fastq_2`     | Absolute path to FASTQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz".                                                                          |
+| `sex`         | Sex (1=male; 2=female; for unknown sex use 0 or other).                                                                                                                                                 |
+| `phenotype`   | Affected status of patient (0 = missing; 1=unaffected; 2=affected).                                                                                                                                     |
+| `paternal_id` | Sample ID of the father, can be blank if the father isn't part of the analysis or for samples other than the proband.                                                                                   |
+| `maternal_id` | Sample ID of the mother, can be blank if the mother isn't part of the analysis or for samples other than the proband.                                                                                   |
+| `case_id`     | Case ID, for the analysis used when generating a family VCF.                                                                                                                                            |
 
 It is also possible to include multiple runs of the same sample in a samplesheet. For example, when you have re-sequenced the same sample more than once to increase sequencing depth. In that case, the `sample` identifiers in the samplesheet have to be the same. The pipeline will align the raw read/read-pairs independently before merging the alignments belonging to the same sample. Below is an example for a trio with the proband sequenced across two lanes:
 

diff --git a/subworkflows/local/alignment/align_bwa_bwamem2_bwameme.nf b/subworkflows/local/alignment/align_bwa_bwamem2_bwameme.nf
@@ -68,7 +68,7 @@ workflow ALIGN_BWA_BWAMEM2_BWAMEME {
         ch_align
             .map{ meta, bam ->
                     new_id   = meta.sample
-                    new_meta = meta + [id:new_id, read_group:"\'@RG\\tID:" + new_id + "\\tPL:" + val_platform + "\\tSM:" + new_id + "\'"] - meta.subMap('lane')
+                    new_meta = meta + [id:new_id, read_group:"\'@RG\\tID:" + new_id + "_" + meta.lane + "\\tPL:" + val_platform + "\\tSM:" + new_id + "\'"] - meta.subMap('lane')
                     [groupKey(new_meta, new_meta.num_lanes), bam]
                 }
             .groupTuple()