-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/!\ DO NOT MERGE: This is the FIRST PR for community review #57
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the custom scripts need to have licenses and the authors
Harshil alignment for code readability (i.e. on ? and : in the modules.config)
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets | ||
// TODO nf-core: Give any required params for the test so that command line flags are not needed | ||
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv' | ||
input = "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/nf-core-pixelator/testdata/micro/test_samplesheet.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will these move to nf-core/test-data once you are read to go public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's small you could just put it within the repo (tests/data/test_samplesheet.csv
or something).
|
||
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) | ||
- [cutadapt](http://dx.doi.org/10.14806/ej.17.1.200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used internally? I don't see the modules, same for fastp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we use cutadapt and fastp as subprocesses in the pixelator
tool.
Should we remove them here to reduce confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, does that hook into the demux
process? I'm guessing it's to speed up by piping it all straight through?
Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cutadapt is used in both the demux and adapterqc processes.
Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.
This is something @adamrtalbot was also wondering about.
The main reason to wrap cutadapt is because we want to make sure running a pixelator analysis on a single sample remains straightforward without using the pipeline.
I do not think that unwrapping it from the pixelator single-cell demux
command is something we would want to do in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the same thing is done in bases2fastq with Element, for example. But there is a flag to flip it off as well.
ch_multiqc_logo.toList() | ||
) | ||
multiqc_report = MULTIQC.out.report.toList() | ||
// TODO: Add MultiQC after plugins are implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be added for v1.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not make it before 1.0 (but we definitely plan to work on it soon after)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment from me :)
This functionality is now merged in nf-core/tools (nf-core/tools#2362) so we can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few style things for me. I don't see anything major!
A thought, could the pixelator report be added to the tower.yml
?
|
||
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) | ||
- [cutadapt](http://dx.doi.org/10.14806/ej.17.1.200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, does that hook into the demux
process? I'm guessing it's to speed up by piping it all straight through?
Just wondering if that could be flipped off in the future to let Nextflow handle the job submission in this case.
conf/modules.config
Outdated
|
||
withName: 'SAMPLESHEET_CHECK' { | ||
if (params.pixelator_container) { | ||
container = params.pixelator_container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this container shouldn't need pixelator, just python by itself? Or is there something we're missing here?
conf/modules.config
Outdated
|
||
withName: 'SAMPLESHEET_CHECK' { | ||
if (params.pixelator_container) { | ||
container = params.pixelator_container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would just using process.container = '/path/to/container'
in their local config for the run work if you just want every container to be pixelator?
conf/modules.config
Outdated
ext.args = { | ||
[ | ||
"--design ${meta.design}", | ||
(params.trim_front != null)? "--trim-front ${params.trim_front}": '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think the user input validation would get caught in the schema check at the beginning of the workflow.
What's the ideal arg to be passed? --trim-front 50
for example? And the problem is, you can't get --trim-front 0
?
ch_panel_files_grouped = ch_report_data.map { id, data -> [ data[0], data[1] ] } | ||
ch_amplicon_grouped = ch_report_data.map { id, data -> data[2] ? data[2].flatten() : [] } | ||
ch_preqc_grouped = ch_report_data.map { id, data -> data[3] ? data[3].flatten() : [] } | ||
ch_adapterqc_grouped = ch_report_data.map { id, data -> data[4] ? data[4].flatten() : [] } | ||
ch_demux_grouped = ch_report_data.map { id, data -> data[5] ? data[5].flatten() : [] } | ||
ch_collapse_grouped = ch_report_data.map { id, data -> data[6] ? data[6].flatten() : [] } | ||
ch_graph_grouped = ch_report_data.map { id, data -> data[7] ? data[7].flatten() : [] } | ||
ch_annotate_grouped = ch_report_data.map { id, data -> data[8] ? data[8].flatten() : [] } | ||
ch_analysis_grouped = ch_report_data.map { id, data -> data[9] ? data[9].flatten() : [] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there's a "fancier" way to write this, but I also just appreciate the verbosity and that it's easy to understand what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is quite verbose. Suggestions are welcome 🙂
WorkflowPixelator.initialise(params, log) | ||
// Inject the samplesheet SHA-1 into the params object | ||
ch_input = file(params.input) | ||
params.samplesheet_sha = ch_input.bytes.digest('sha-1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this, that's interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this value with our AWS config to set the output path.
This makes sure we do not override result files when doing reruns of previous experiments with modified parameters or samplesheet.
bin/check_samplesheet.py
Outdated
def make_absolute_path(path: str, base: PathLike = None) -> str: | ||
"""If `path` is a relative path without a scheme, resolve it as relative to `base` | ||
|
||
Take into account that paths can be references to remote resources eg. [s3://, az://, gs://, file:///, https://, ... ] | ||
""" | ||
url = urllib.parse.urlparse(path) | ||
if url.scheme: | ||
return path | ||
|
||
if base is None: | ||
url_props = list(url) | ||
url_props[0] = "file" | ||
path_component = PurePath(url_props[2]) | ||
url_props[2] = str(path_component) if path_component.is_absolute() else str(PurePath("/", str(path_component))) | ||
urllib.parse.urlunparse(url_props) | ||
return urllib.parse.urlunparse(url_props) | ||
|
||
# If the base url has a scheme we need to keep that | ||
# purepath will remove double slashes and this invalidates the base scheme | ||
base_url = urllib.parse.urlparse(str(base)) | ||
scheme = base_url[0] or "file" | ||
resolved_path = PurePath(base_url.path) / path | ||
url = list(base_url) | ||
url[0] = base_url[0] or "file" | ||
url[2] = str(resolved_path) | ||
|
||
# Make sure there are three /// (hidden netloc) in urls with file scheme. | ||
# other schemes [s3, gs, az] only use two // | ||
if scheme == "file": | ||
resolved_path = str(resolved_path) if resolved_path.is_absolute() else str(PurePath("/", str(resolved_path))) | ||
return f"{scheme}://{resolved_path}" | ||
|
||
return urllib.parse.urlunparse(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me nervous. Why is it necessary? You should never point to an absolute path in Nextflow really...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that we found very useful at pixelgen.
We are rewriting relative paths in the samplesheet relative to the parent directory of the samplesheet.
This works with a remote samplesheet as well (http, s3, az, gs, ...).
This allows us to create a directory with the samplesheet and input data files and move that directory anywhere (local or remote) without having to edit paths in the samplesheet. (As long as inputs and samplesheet have a common root dir of course)
@FriederikeHanssen This is also the reason why we are not yet using nf-validation to load the samplesheet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's a bit risky messing around with strings like they're paths on the process.
You could do this in Nextflow after the samplesheet has been parsed, then go through a little logic problem for if file doesn't exist add "full" path then check if the file exists again then if fail if it's not there? It might be a little bit cleaner as well.
If it works for you so far no problem, but something to consider replacing in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the relative path logic to be done in nextflow after the normal validation with nf-validation as you suggested in PixelgenTechnologies@1cc3116.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that solution is way nicer. It pulls on all the native Nextflow-y goodness. The main downside is it's not as obvious for a Python person but I think that's worth the cost.
conf/modules.config
Outdated
|
||
withName: 'SAMPLESHEET_CHECK' { | ||
if (params.pixelator_container) { | ||
container = params.pixelator_container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to separate the background data sources from the software and use the same container, but configure alternative inputs.
Does the SAMPLESHEET_CHECK need to use alternative containers? Seems like a Python process and probably doesn't need the extra code?
conf/modules.config
Outdated
if (params.pixelator_container) { | ||
container = params.pixelator_container | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, feels weird on every process including the ones that don't use the Pixelator software. But if you are going to add this to every process you can just do:
process {
if (params.pixelator_container) {
container = params.pixelator_container
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels weird on every process including the ones that don't use the Pixelator software.
As well as collecting info about the nextflow environment, COLLECT_METADATA is also reporting on the exact execution environment of the pixelator tool. So this step must be run with the pixelator container / conda env / ... .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah dang. This is why nf-core pipelines create a versions.yml in every process then combine them at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versions.yml go a long way, but for pixelator we also wanted to have a list of all python packages in the env to debug compatibility issues. I have renamed the process to PIXELATOR_COLLECT_METADATA to make it more clear that this is pixelator specific.
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets | ||
// TODO nf-core: Give any required params for the test so that command line flags are not needed | ||
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv' | ||
input = "https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/nf-core-pixelator/testdata/micro/test_samplesheet.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's small you could just put it within the repo (tests/data/test_samplesheet.csv
or something).
modules/local/collect_metadata.nf
Outdated
def nextflowJson = builder.toPrettyString() | ||
|
||
""" | ||
echo '${nextflowJson}' > nextflow-metadata.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use jq for JSON validation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering what jq
would add here? nextflowJson
should already contains valid json since we generate it with the groovy json builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I was just wanting to be cautious in case it gets messed up somehow.
tag "$meta.id" | ||
label 'process_medium' | ||
|
||
conda "local::pixelator=0.12.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work for anyone else? How about on a clean cloud computer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a placeholder to test with a local conda recipe. After the main tool is public in bioconda we will update this. I will add some todo's as reminders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No release until it's on bioconda then 😉
|
||
prefix = task.ext.prefix ?: "${meta.id}" | ||
def args = task.ext.args ?: '' | ||
def panelOpt = panel ?: panel_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def panelOpt = panel ?: panel_file | |
def panelOpt = panel ? "--panel $panel_file" : "" |
I might not understand what this is doing, but don't you want to add a panel if panel is true?
Also, you could simplify to this and drop the val(panel)
, where panel_file is an empty list for an optional input:
def panelOpt = panel ?: panel_file | |
def panelOpt = panel_file ? "--panel $panel_file" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a bit unclear.
The underlying --panel option in pixelator
can take two types of inputs:
- A string key to a built-in panel (most likely use case).
- A path to a custom panel file.
In the pipeline we need to split this up in the samplesheet as panel
for a known string key and panel_file
for a a path. One or the other must be given. So if panel is set we pass that on to --panel
if unset we will pass the file from panel_file
to --panel
. The samplesheet validation will make sure that not both are set (or empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Perhaps this so it's automatic based on whether panel or panel_file is populated?
input:
tuple val(meta), path(reads), val(panel), path(panel_file)
def panelOpt = panel ? "--panel $panel" :
panel_file ? "--panel $panel_file" :
""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Perhaps this so it's automatic based on whether panel or panel_file is populated?
input:
tuple val(meta), path(reads), val(panel), path(panel_file)
def panelOpt = panel ? "--panel $panel" :
panel_file ? "--panel $panel_file" :
""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this in a043d2b.
This is more robust since it will not trip up pixelator should "" or False slip through the parameter validation and be passed in.
Samplesheet validation and channel construction is now done using nf-validation. Relative paths in samplesheets are now resolved in nextflow itself instead of the old check_samplesheet.py. panel keys are now validated as well.
@@ -57,7 +101,6 @@ params { | |||
validationSchemaIgnoreParams = 'genomes' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need to add genomes to validationSchemaIgnoreParams
since you're not using igenomes or any params.genomes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with all the changes, all lookin' good now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go - just two issues left outstanding:
Couldn't use the first commit an origin to create a new branch or as a base for the PR, so I used TEMPLATE, but results should be similar.
So please, DO NOT MERGE, if suggestions are made, please DO NOT ACCEPT either, but makes PR towards dev.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).