/!\ DO NOT MERGE: This is the FIRST PR for community review #57

maxulysse · 2023-07-27T21:58:02Z

Couldn't use the first commit an origin to create a new branch or as a base for the PR, so I used TEMPLATE, but results should be similar.
So please, DO NOT MERGE, if suggestions are made, please DO NOT ACCEPT either, but makes PR towards dev.

PR checklist

FriederikeHanssen

All the custom scripts need to have licenses and the authors
Harshil alignment for code readability (i.e. on ? and : in the modules.config)

README.md

FriederikeHanssen · 2023-08-06T08:49:29Z

conf/test.config

-    // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
-    // TODO nf-core: Give any required params for the test so that command line flags are not needed
-    input  = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
+    input  = "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/nf-core-pixelator/testdata/micro/test_samplesheet.csv"


will these move to nf-core/test-data once you are read to go public?

Yes, definitely.

If it's small you could just put it within the repo (tests/data/test_samplesheet.csv or something).

FriederikeHanssen · 2023-08-06T08:53:46Z

CITATIONS.md


- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+- [cutadapt](http://dx.doi.org/10.14806/ej.17.1.200)


Is it used internally? I don't see the modules, same for fastp

Yes, we use cutadapt and fastp as subprocesses in the pixelator tool.
Should we remove them here to reduce confusion?

As an aside, does that hook into the demux process? I'm guessing it's to speed up by piping it all straight through?

Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.

Yes, cutadapt is used in both the demux and adapterqc processes.

Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.

This is something @adamrtalbot was also wondering about.

The main reason to wrap cutadapt is because we want to make sure running a pixelator analysis on a single sample remains straightforward without using the pipeline.

I do not think that unwrapping it from the pixelator single-cell demux command is something we would want to do in the future.

Agreed, the same thing is done in bases2fastq with Element, for example. But there is a flag to flip it off as well.

FriederikeHanssen · 2023-08-06T08:55:22Z

workflows/pixelator.nf

-        ch_multiqc_logo.toList()
-    )
-    multiqc_report = MULTIQC.out.report.toList()
+    // TODO: Add MultiQC after plugins are implemented


will it be added for v1.0?

We will not make it before 1.0 (but we definitely plan to work on it soon after)

nvnieuwk

One small comment from me :)

lib/WorkflowPixelator.groovy

This functionality is now merged in nf-core/tools (nf-core/tools#2362) so we can remove this.

edmundmiller

Just a few style things for me. I don't see anything major!

A thought, could the pixelator report be added to the tower.yml?

.gitignore

edmundmiller · 2023-08-09T13:27:54Z

CITATIONS.md


- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
+- [cutadapt](http://dx.doi.org/10.14806/ej.17.1.200)


As an aside, does that hook into the demux process? I'm guessing it's to speed up by piping it all straight through?

Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.

edmundmiller · 2023-08-09T13:30:40Z

conf/modules.config

+
+    withName: 'SAMPLESHEET_CHECK' {
+        if (params.pixelator_container) {
+            container = params.pixelator_container


But this container shouldn't need pixelator, just python by itself? Or is there something we're missing here?

edmundmiller · 2023-08-09T13:32:19Z

conf/modules.config

+
+    withName: 'SAMPLESHEET_CHECK' {
+        if (params.pixelator_container) {
+            container = params.pixelator_container


Would just using process.container = '/path/to/container' in their local config for the run work if you just want every container to be pixelator?

edmundmiller · 2023-08-09T13:39:02Z

conf/modules.config

+        ext.args = {
+            [
+                "--design ${meta.design}",
+                (params.trim_front != null)? "--trim-front ${params.trim_front}": '',


I'd think the user input validation would get caught in the schema check at the beginning of the workflow.

What's the ideal arg to be passed? --trim-front 50 for example? And the problem is, you can't get --trim-front 0?

modules/local/samplesheet_check.nf

nextflow.config

edmundmiller · 2023-08-09T13:53:24Z

subworkflows/local/generate_reports.nf

+    ch_panel_files_grouped  = ch_report_data.map { id, data -> [ data[0], data[1] ] }
+    ch_amplicon_grouped     = ch_report_data.map { id, data -> data[2] ? data[2].flatten() : [] }
+    ch_preqc_grouped        = ch_report_data.map { id, data -> data[3] ? data[3].flatten() : [] }
+    ch_adapterqc_grouped    = ch_report_data.map { id, data -> data[4] ? data[4].flatten() : [] }
+    ch_demux_grouped        = ch_report_data.map { id, data -> data[5] ? data[5].flatten() : [] }
+    ch_collapse_grouped     = ch_report_data.map { id, data -> data[6] ? data[6].flatten() : [] }
+    ch_graph_grouped        = ch_report_data.map { id, data -> data[7] ? data[7].flatten() : [] }
+    ch_annotate_grouped     = ch_report_data.map { id, data -> data[8] ? data[8].flatten() : [] }
+    ch_analysis_grouped     = ch_report_data.map { id, data -> data[9] ? data[9].flatten() : [] }


I feel like there's a "fancier" way to write this, but I also just appreciate the verbosity and that it's easy to understand what's going on.

Yeah, it is quite verbose. Suggestions are welcome 🙂

edmundmiller · 2023-08-09T13:54:11Z

workflows/pixelator.nf

-WorkflowPixelator.initialise(params, log)
+// Inject the samplesheet SHA-1 into the params object
+ch_input               = file(params.input)
+params.samplesheet_sha = ch_input.bytes.digest('sha-1')


Love this, that's interesting.

We use this value with our AWS config to set the output path.
This makes sure we do not override result files when doing reruns of previous experiments with modified parameters or samplesheet.

.github/workflows/ci.yml

.pre-commit-config.yaml

adamrtalbot · 2023-08-09T14:40:42Z

bin/check_samplesheet.py

+def make_absolute_path(path: str, base: PathLike = None) -> str:
+    """If `path` is a relative path without a scheme, resolve it as relative to `base`
+
+    Take into account that paths can be references to remote resources eg. [s3://, az://, gs://, file:///, https://, ... ]
+    """
+    url = urllib.parse.urlparse(path)
+    if url.scheme:
+        return path
+
+    if base is None:
+        url_props = list(url)
+        url_props[0] = "file"
+        path_component = PurePath(url_props[2])
+        url_props[2] = str(path_component) if path_component.is_absolute() else str(PurePath("/", str(path_component)))
+        urllib.parse.urlunparse(url_props)
+        return urllib.parse.urlunparse(url_props)
+
+    # If the base url has a scheme we need to keep that
+    # purepath will remove double slashes and this invalidates the base scheme
+    base_url = urllib.parse.urlparse(str(base))
+    scheme = base_url[0] or "file"
+    resolved_path = PurePath(base_url.path) / path
+    url = list(base_url)
+    url[0] = base_url[0] or "file"
+    url[2] = str(resolved_path)
+
+    # Make sure there are three /// (hidden netloc) in urls with file scheme.
+    # other schemes [s3, gs, az] only use two //
+    if scheme == "file":
+        resolved_path = str(resolved_path) if resolved_path.is_absolute() else str(PurePath("/", str(resolved_path)))
+        return f"{scheme}://{resolved_path}"
+
+    return urllib.parse.urlunparse(url)


This makes me nervous. Why is it necessary? You should never point to an absolute path in Nextflow really...

This is something that we found very useful at pixelgen.

We are rewriting relative paths in the samplesheet relative to the parent directory of the samplesheet.
This works with a remote samplesheet as well (http, s3, az, gs, ...).

This allows us to create a directory with the samplesheet and input data files and move that directory anywhere (local or remote) without having to edit paths in the samplesheet. (As long as inputs and samplesheet have a common root dir of course)

@FriederikeHanssen This is also the reason why we are not yet using nf-validation to load the samplesheet.

Hmm it's a bit risky messing around with strings like they're paths on the process.

You could do this in Nextflow after the samplesheet has been parsed, then go through a little logic problem for if file doesn't exist add "full" path then check if the file exists again then if fail if it's not there? It might be a little bit cleaner as well.

If it works for you so far no problem, but something to consider replacing in the near future.

I have changed the relative path logic to be done in nextflow after the normal validation with nf-validation as you suggested in PixelgenTechnologies@1cc3116.

I think that solution is way nicer. It pulls on all the native Nextflow-y goodness. The main downside is it's not as obvious for a Python person but I think that's worth the cost.

adamrtalbot · 2023-08-09T14:59:35Z

conf/modules.config

+
+    withName: 'SAMPLESHEET_CHECK' {
+        if (params.pixelator_container) {
+            container = params.pixelator_container


It would be easier to separate the background data sources from the software and use the same container, but configure alternative inputs.

Does the SAMPLESHEET_CHECK need to use alternative containers? Seems like a Python process and probably doesn't need the extra code?

adamrtalbot · 2023-08-09T15:01:22Z

conf/modules.config

+        if (params.pixelator_container) {
+            container = params.pixelator_container
+        }


Again, feels weird on every process including the ones that don't use the Pixelator software. But if you are going to add this to every process you can just do:

process { if (params.pixelator_container) { container = params.pixelator_container } }

feels weird on every process including the ones that don't use the Pixelator software.

As well as collecting info about the nextflow environment, COLLECT_METADATA is also reporting on the exact execution environment of the pixelator tool. So this step must be run with the pixelator container / conda env / ... .

Ah dang. This is why nf-core pipelines create a versions.yml in every process then combine them at the end.

The versions.yml go a long way, but for pixelator we also wanted to have a list of all python packages in the env to debug compatibility issues. I have renamed the process to PIXELATOR_COLLECT_METADATA to make it more clear that this is pixelator specific.

adamrtalbot · 2023-08-09T15:06:02Z

conf/test.config

-    // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
-    // TODO nf-core: Give any required params for the test so that command line flags are not needed
-    input  = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
+    input  = "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/nf-core-pixelator/testdata/micro/test_samplesheet.csv"


If it's small you could just put it within the repo (tests/data/test_samplesheet.csv or something).

docs/output.md

adamrtalbot · 2023-08-09T15:11:16Z

modules/local/collect_metadata.nf

+    def nextflowJson = builder.toPrettyString()
+
+    """
+    echo '${nextflowJson}' > nextflow-metadata.json


You could use jq for JSON validation here.

I am wondering what jq would add here? nextflowJson should already contains valid json since we generate it with the groovy json builder.

True, I was just wanting to be cautious in case it gets messed up somehow.

adamrtalbot · 2023-08-09T15:12:15Z

modules/local/pixelator/single-cell/analysis/main.nf

+    tag "$meta.id"
+    label 'process_medium'
+
+    conda "local::pixelator=0.12.0"


Will this work for anyone else? How about on a clean cloud computer?

This is a placeholder to test with a local conda recipe. After the main tool is public in bioconda we will update this. I will add some todo's as reminders.

No release until it's on bioconda then 😉

adamrtalbot · 2023-08-09T15:15:30Z

modules/local/pixelator/single-cell/demux/main.nf

+
+    prefix = task.ext.prefix ?: "${meta.id}"
+    def args = task.ext.args ?: ''
+    def panelOpt = panel ?: panel_file


Suggested change

def panelOpt = panel ?: panel_file

def panelOpt = panel ? "--panel $panel_file" : ""

I might not understand what this is doing, but don't you want to add a panel if panel is true?

Also, you could simplify to this and drop the val(panel), where panel_file is an empty list for an optional input:

Suggested change

def panelOpt = panel ?: panel_file

def panelOpt = panel_file ? "--panel $panel_file" : ""

This is indeed a bit unclear.

The underlying --panel option in pixelator can take two types of inputs:

A string key to a built-in panel (most likely use case).

A path to a custom panel file.

In the pipeline we need to split this up in the samplesheet as panel for a known string key and panel_file for a a path. One or the other must be given. So if panel is set we pass that on to --panel if unset we will pass the file from panel_file to --panel. The samplesheet validation will make sure that not both are set (or empty).

Interesting. Perhaps this so it's automatic based on whether panel or panel_file is populated?

input: tuple val(meta), path(reads), val(panel), path(panel_file)

def panelOpt = panel ? "--panel $panel" : panel_file ? "--panel $panel_file" : ""

I have added this in a043d2b.

This is more robust since it will not trip up pixelator should "" or False slip through the parameter validation and be passed in.

Samplesheet validation and channel construction is now done using nf-validation. Relative paths in samplesheets are now resolved in nextflow itself instead of the old check_samplesheet.py. panel keys are now validated as well.

maxulysse · 2023-08-21T09:45:42Z

nextflow.config

@@ -57,7 +101,6 @@ params {
    validationSchemaIgnoreParams     = 'genomes'


you shouldn't need to add genomes to validationSchemaIgnoreParams since you're not using igenomes or any params.genomes

adamrtalbot

I'm happy with all the changes, all lookin' good now!

ewels

Good to go - just two issues left outstanding:

fbdtemme and others added 30 commits June 12, 2023 15:33

initial commit

8e4253c

syncronize with pixelator 0.11.0 release

7a9b0c5

use nf-validation for samplesheet

f9af04c

refactor and sync with pixelator dev

2ee0d64

chore: remove report files

b510391

chore: add CODEOWNERS file

47f4559

style: reformat

578912e

feat: add params-file template

01a54f2

style: fix formatting

b8c37f6

docs: update usage

7ffc1db

feat: set test profile data to public s3 datasets

3d7729c

docs: update README

46d0c3b

fix: add missing label to metromap

1f64f56

fix: update used container

d412f16

feat: fix container override

d46ee97

feat: bump container versions

dc60078

support panel and panel_file

4055280

fix: use pr-444 pixelator container

f006e55

docs: fixed in output.md

0447887

docs: update output.md

1810835

docs: fix pipeline overview bullet list order

81080a7

Clarifications in README

4d9596e

docs: update output.md

5267363

docs: remove todo

07454a4

docs: more output.md tweaks

7077233

Update output docs

881d8b7

Merge branch 'dev' into reading-the-output-docs

962c44b

docs: fix unmatched backticks

71d233c

docs: remove reference to --panel-file input

5c2924b

docs: add backticks

05a8313

ci: authenticate with ghcr.io

56e507b

FriederikeHanssen reviewed Aug 6, 2023

View reviewed changes

nvnieuwk reviewed Aug 9, 2023

View reviewed changes

lib/WorkflowPixelator.groovy Outdated Show resolved Hide resolved

fbdtemme added 5 commits August 9, 2023 13:23

chore: remove create-params-template.py

e9cbb60

This functionality is now merged in nf-core/tools (nf-core/tools#2362) so we can remove this.

fix: remove redundant params.input check

bbdc3c6

fix: sync with renamed parameter in pixelator

7611bac

docs: fix duplicated part of pipeline summary

502aeec

fix: do not auto-format generated nf-params.yml

abdcbe7

edmundmiller approved these changes Aug 9, 2023

View reviewed changes

fbdtemme added 3 commits August 9, 2023 17:07

style: fix alignment

37a2aad

chore: remove node_modules gitignore entry

21f2d57

style: fix formatting

bd3aedc

adamrtalbot reviewed Aug 9, 2023

View reviewed changes

fbdtemme added 3 commits August 9, 2023 17:40

chore: add todo's for conda channel update

b91a271

fix: use specific commit instead of moving dev

6f75bd6

chore: remove unneeded todo

2ca26ae

fbdtemme mentioned this pull request Aug 9, 2023

refactor: split out pixelator dependency in CHECK_SAMPLESHEET #58

Merged

fbdtemme added 7 commits August 10, 2023 10:14

fix: remove outdir from test profile

8bc5302

style: module reformatting

5354b30

refactor: remove unused pbs arguments

c645fd6

refactor: replace null checks in modules.conf for integer options

91c5db8

This is more robust since it will not trip up pixelator should "" or False slip through the parameter validation and be passed in.

refactor: split out pixelator dependency in CHECK_SAMPLESHEET

f177cf7

fix: fix process name in COLLECT_METADATA versions.yml

baeb639

refactor: improve panel/panel_file passing, rename collect metadata

a043d2b

fbdtemme force-pushed the dev branch from ca16d1f to a043d2b Compare August 10, 2023 13:01

refactor: rework samplesheet validation

1cc3116

Samplesheet validation and channel construction is now done using nf-validation. Relative paths in samplesheets are now resolved in nextflow itself instead of the old check_samplesheet.py. panel keys are now validated as well.

maxulysse commented Aug 21, 2023

View reviewed changes

adamrtalbot reviewed Aug 21, 2023

View reviewed changes

ewels approved these changes Aug 21, 2023

View reviewed changes

ambarrio closed this Aug 29, 2023


		- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
		- [cutadapt](http://dx.doi.org/10.14806/ej.17.1.200)

	def panelOpt = panel ?: panel_file
	def panelOpt = panel ? "--panel $panel_file" : ""

	def panelOpt = panel ?: panel_file
	def panelOpt = panel_file ? "--panel $panel_file" : ""

		@@ -57,7 +101,6 @@ params {
		validationSchemaIgnoreParams = 'genomes'

/!\ DO NOT MERGE: This is the FIRST PR for community review #57

/!\ DO NOT MERGE: This is the FIRST PR for community review #57

Conversation

maxulysse commented Jul 27, 2023

PR checklist

FriederikeHanssen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvnieuwk left a comment

Choose a reason for hiding this comment

edmundmiller left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbdtemme Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamrtalbot left a comment

Choose a reason for hiding this comment

ewels left a comment

Choose a reason for hiding this comment

edmundmiller left a comment •

edited

Loading

fbdtemme Aug 9, 2023 •

edited

Loading