Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish operator #1540

Closed
szymonwieloch opened this issue Mar 23, 2020 · 4 comments
Closed

Publish operator #1540

szymonwieloch opened this issue Mar 23, 2020 · 4 comments

Comments

@szymonwieloch
Copy link

New feature

Publish operator

Usage scenario

The current publishDir attribute of processes is not flexible enough. Please consider the following example:

process test {
input:
 file('in.txt') from inputChannel
script:
"""
somecmd in.txt -o out1.txt -p out2.txt -r out3.txt
"""
}

Now, what if I want to publish out1.txt using move mode, out2.txt using copy mode and output (but not pubish) out3.txt? The current approach is not flexible enough and it becomes complex. I think it would be good t replace the publishDir parameter with a publish operator,like this:

output: 
file('out1.txt') into out1
file('out2.txt') into out2
file('out3.txt') into out3

And then:

out1.publish('.', mode:'move')
out2.publish('output', mode: 'copy')
@tamuanand
Copy link

I think you can mv and cp commands after your 1st line in script

Something like;

publishDir "${params.outdir}/my_wanted_folder", mode:'copy'

input:
 file('in.txt') from inputChannel

ouptut:
 file(*.txt) 

"""
somecmd in.txt -o out1.txt -p out2.txt -r out3.txt
mv out1.txt my_move_out1,txt
cp out2.txt my_copy_out2.txt
"""

@DaGaMs
Copy link
Contributor

DaGaMs commented Apr 15, 2020

I think publish should indeed be an operator. That seems like the conceptually most "correct" way to handle this.

@Puumanamana
Copy link

Puumanamana commented Aug 4, 2020

I also feel like a publish operator could be useful. For example, with the DSL2 syntax, I could not find an easy way to run twice the same process (within 2 different workflows) and publish it, e.g. something like this:

nextflow.enable.dsl = 2

process fastqc {
    publishDir 'QC'

    input: file(fastq)
    output: file("*.{html,zip}")
    script: "fastqc $fastq"
}

process multiqc {
    publishDir 'QC'

    input: file(fqc_outputs)
    output: file("*.html")
    script: "multiqc ."
}

process fastp {
// snip
}

workflow qc {
    take: reads
    main: reads | fastqc | collect | multiqc
}

workflow trimming {
    take: reads
    main: reads | fastp | qc
}

workflow {
    Channel.fromPath(params.reads) | (qc & trimming)
}

In the above example, all files generated by multiqc will be published in the same folder (for fastqc as well, but they will be named differently at least). Since the multiqc process cannot distinguish between the files that arrived directly from the ones that arrive after trimming, it cannot name the summary files differently. I could use an extra parameter to track the workflow execution, but it makes the code less clean.

EDIT: I just found out the variable task.process keeps track of the workflow being called, so it can be used in my example, something like this:

publishDir "QC/${task.process.replaceAll(':', '-')}"

I could not find it in the documentation though

@stale
Copy link

stale bot commented Jan 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants