Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sample_name as possible column in samplesheet #31

Merged
merged 7 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: nf-core linting
# This workflow is triggered on pushes and PRs to the repository.
# It runs the `nf-core lint` and markdown lint tests to ensure
# It runs the `nf-core pipelines lint` and markdown lint tests to ensure
# that the code meets the nf-core guidelines.
on:
push:
Expand Down Expand Up @@ -41,17 +41,32 @@ jobs:
python-version: "3.12"
architecture: "x64"

- name: read .nf-core.yml
uses: pietrobolcato/action-read-yaml@1.1.0
id: read_yml
with:
config: ${{ github.workspace }}/.nf-core.yml

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }}

- name: Run nf-core pipelines lint
if: ${{ github.base_ref != 'master' }}
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md

- name: Run nf-core lint
- name: Run nf-core pipelines lint --release
if: ${{ github.base_ref == 'master' }}
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md

- name: Save PR number
if: ${{ always() }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download lint results
uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3
uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6
with:
workflow: linting.yml
workflow_conclusion: completed
Expand Down
5 changes: 4 additions & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
repository_type: pipeline
nf_core_version: "2.14.1"
nf_core_version: "3.0.1"
lint:
files_exist:
- assets/nf-core-gasclustering_logo_light.png
Expand Down Expand Up @@ -31,6 +31,9 @@ lint:
- custom_config
- manifest.name
- manifest.homePage
- params.max_cpus
- params.max_memory
- params.max_time
readme:
- nextflow_badge

Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## In Development

- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration.

- `sample_name` special characters (non-alphanumeric with exception of "_" and ".") will be replaced with `"_"`
- If no `sample_name` is supplied in the column `sample` will be used
- To avoid repeat values for `sample_name` all `sample_name` values will be suffixed with the unique `sample` value from the input file

- Fixed linting issues in CI caused by nf-core 3.0.1

## [0.3.0] - 2024-09-10

### Changed
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ The input to the pipeline is a standard sample sheet (passed as `--input samples

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Validation of the sample sheet is performed by [nf-validation](https://nextflow-io.github.io/nf-validation/). Details on the columns can be found in the [Full samplesheet](docs/usage.md#full-samplesheet) documentation.

## IRIDA-Next Optional Input Configuration

`gasclustering` accepts the [IRIDA-Next](https://github.com/phac-nml/irida-next) format for samplesheets which can contain an additional column: `sample_name`

`sample_name`: An **optional** column, that overrides `sample` for outputs (filenames and sample names) and reference assembly identification.

`sample_name` allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

An [example samplesheet](../tests/data/samplesheets/samplesheet-samplename.csv) has been provided with the pipeline.

# Parameters

The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run.
Expand Down
11 changes: 8 additions & 3 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$schema": "https://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/phac-nml/gasclustering/main/assets/schema_input.json",
"title": "phac-nml/gasclustering pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
Expand All @@ -10,9 +10,14 @@
"sample": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["id"],
"meta": ["irida_id"],
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
"errorMessage": "Sample name must be provided and cannot contain spaces."
},
"sample_name": {
"type": "string",
"meta": ["id"],
"errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs"
},
"mlst_alleles": {
"type": "string",
Expand Down
2 changes: 1 addition & 1 deletion conf/iridanext.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ iridanext {
overwrite = true
validate = true
files {
idkey = "id"
idkey = "irida_id"
global = [
"**/ArborView/arborview.clustered_data_arborview.html",
"**/clusters/gas.mcluster.clusters.text",
Expand Down
26 changes: 25 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ You will need to create a samplesheet with information about the samples you wou
--input '[path to samplesheet file]'
```

### Full samplesheet
### Full Standard Samplesheet

The input samplesheet must contain 10 columns: `sample`, `mlst_alleles`, `metadata_1`, `metadata_2`, ..., `metadata_8`. The `sample` IDs within a samplesheet should be unique. All other columns outside of the listed above will be ignored.

Expand All @@ -33,6 +33,30 @@ SAMPLE3,sample3.mlst.subtyping.json.gz,Canada,2021,,,,,,

An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

### IRIDA-Next Optional Samplesheet Configuration

`gasclustering` accepts the [IRIDA-Next](https://github.com/phac-nml/irida-next) format for samplesheets which contain the following columns: `sample`, `sample_name`, `mlst_alleles`, `metadata_1`, `metadata_2`, ..., `metadata_8`. The `sample` IDs within a samplesheet should be unique. All other columns outside of the listed above will be ignored.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below.

````console

```csv title="samplesheet.csv"
sample,sample_name,mlst_alleles,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
SAMPLE1,S1,sample1.mlst.json.gz,Canada,2024,,,,,,
SAMPLE2,S2,sample2.mlst.json.gz,USA,2024,,,,,,
SAMPLE3, ,sample3.mlst.subtyping.json.gz,Canada,2021,,,,,,
````

| Column | Description |
| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. Samples should be unique within a samplesheet. |
| `sample_name` | Sample name used in outputs (filenames and sample names) |
| `mlst_alleles` | Full path to an MLST JSON file describing the loci/alleles for the sample against some MLST scheme. A way to generate this file is via [locidex](https://github.com/phac-nml/locidex). File can optionally be gzipped and must have the extension ".mlst.json", ".mlst.subtyping.json" (or with an additional ".gz" if gzipped). |
| `metadata_1` to `metadata_8` | Optional metadata values to integrate into the final visualization. |

An [example samplesheet](../tests/data/samplesheets/samplesheet-addsamplename.csv) has been provided with the pipeline.

## Running the pipeline

The typical command for running the pipeline is as follows:
Expand Down
10 changes: 5 additions & 5 deletions modules/local/appendmetadata/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ process APPEND_METADATA {
label 'process_single'

input:
val clusters_path // cluster data as a TSV path
// this needs to be "val", because "path"
// won't stage the file correctly for exec
val metadata_rows // metadata rows (no headers) to be appened, list of lists
val metadata_headers // headers to name the metadata columns
val clusters_path // cluster data as a TSV path
// this needs to be "val", because "path"
// won't stage the file correctly for exec
val metadata_rows // metadata rows (no headers) to be appened, list of lists
val metadata_headers // headers to name the metadata columns

output:
path("clusters_and_metadata.tsv"), emit: clusters
Expand Down
2 changes: 1 addition & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$schema": "https://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/phac-nml/gasclustering/main/nextflow_schema.json",
"title": "phac-nml/gasclustering pipeline parameters",
"description": "IRIDA Next Example Pipeline",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sampleA 1.1.1 1 1 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sampleB 1.1.1 1 1 1 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sampleC 1.2.2 1 2 2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sampleA 1.1.1 1 1 1 sampleA 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sampleB 1.1.1 1 1 1 sampleB 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sampleC 1.2.2 1 2 2 sampleC 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sampleA 1.1.1 1 1 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sampleB 1.1.1 1 1 1 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sample3 1.2.2 1 2 2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sampleA 1.1.1 1 1 1 sampleA 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sampleB 1.1.1 1 1 1 sampleB 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sample3 1.2.2 1 2 2 sample3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
8 changes: 4 additions & 4 deletions tests/data/append/expected_clusters_and_metadata.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 myheader_1 myheader_2 myheader_3 myheader_4 myheader_5 myheader_6 myheader_7 myheader_8
sample1 1.1.1 1 1 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sample2 1.1.1 1 1 1 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sample3 1.2.2 1 2 2 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
id address level_1 level_2 level_3 sample myheader_1 myheader_2 myheader_3 myheader_4 myheader_5 myheader_6 myheader_7 myheader_8
sample1 1.1.1 1 1 1 sample1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
sample2 1.1.1 1 1 1 sample2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
sample3 1.2.2 1 2 2 sample3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
S_1 1.1.1 1 1 1 sample1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
S2_ 1.1.1 1 1 1 sample2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
S2__sample3 1.2.2 1 2 2 sample3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
8 changes: 4 additions & 4 deletions tests/data/append/expected_clusters_and_metadata_hamming.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1
sample2 1.1.2 1 1 2
sample3 2.2.3 2 2 3
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1 sample1
sample2 1.1.2 1 1 2 sample2
sample3 2.2.3 2 2 3 sample3
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1 1.4
sample2 1.1.1 1 1 1
sample3 1.2.2 1 2 2 3.1 3.2 3.8
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1 sample1 1.4
sample2 1.1.1 1 1 1 sample2
sample3 1.2.2 1 2 2 sample3 3.1 3.2 3.8
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id address level_1 level_2 level_3 metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1
sample2 1.1.1 1 1 1
sample3 1.2.2 1 2 2
id address level_1 level_2 level_3 sample metadata_1 metadata_2 metadata_3 metadata_4 metadata_5 metadata_6 metadata_7 metadata_8
sample1 1.1.1 1 1 1 sample1
sample2 1.1.1 1 1 1 sample2
sample3 1.2.2 1 2 2 sample3
4 changes: 4 additions & 0 deletions tests/data/samplesheets/samplesheet-addsamplename.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sample,sample_name,mlst_alleles,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
sample1,S 1,https://raw.githubusercontent.com/phac-nml/gasclustering/dev/tests/data/reports/sample1.mlst.json,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
sample2,S2#,https://raw.githubusercontent.com/phac-nml/gasclustering/dev/tests/data/reports/sample2.mlst.json,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
sample3,S2_,https://raw.githubusercontent.com/phac-nml/gasclustering/dev/tests/data/reports/sample3.mlst.json,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
Loading
Loading