Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workflow Run RO-crate format #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

bentsherman
Copy link
Member

Close #6

cc @stain @simleo

Happy to receive any feedback, it's far from complete but wanted to share what I have so far.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@simleo
Copy link

simleo commented Oct 23, 2023

I've built the plugin and ran:

nextflow run tests/test.nf

with the following (top level) nextflow.config:

plugins {
    id 'nf-prov'
}

params {
    outdir = 'results'
}

prov {
    enabled = true
    formats {
        wrroc {
            file = "${params.outdir}/ro-crate-metadata.json"
            overwrite = true
        }
    }
}

And got this ro-crate-metadata.json. For a first commit it's looking pretty good already! Runcrate reads the resulting ro-crate (runcrate report results) and does not break. However, there are several issues. Here is a list of what I've found:

  • the test.nf workflow is listed in the metadata but it's not in the crate directory (results).
  • The output files are in the crate directory but they are listed with a wrong path, e.g. work/e1/80ed247039cd71794ba71091aedf2b/r1.foo.2.txt while it should be simply r1.foo.2.txt since that's the relative path to the crate dir.
  • The crate claims conformance to Provenance Run Crate, so it should list, in addition to the CreateAction corresponding to the workflow run, additional CreateActions corresponding to individual tool execution. In the case of the workflow that I ran, there is only one process (RNG). Assuming we can consider processes as the tools orchestrated by the workflow, there should be a SoftwareApplication to represent RNG, which would be referenced from the workflow's hasPart (currently empty). There should also be a HowToStep instance corresponding to the RNG step. The additional CreateAction instances should be three, since the tool is executed with r1, r2 and r3 as the values for the prefix.
  • The workflow has outdir as a formal parameter, but the only parameter you can actually set when launching the workflow is constant.
  • author1 is listed as an agent, but I guess it's actually the workflow author that's being read from the other nextflow.config (the one in the same directory as the workflow). The agent of an action should represent whoever executed the action. BTW, the @id should be #author1, since it's a contextual entity internal to the crate.

I know next to nothing about Nextflow, but my impression is that the outputs are copied to the results directory because of the line:

    publishDir "results", mode: 'copy'

However, to export as RO-Crate, the relevant files (input, output, workflow, ...) should always be inside the crate's directory tree. This should not depend on the specific workflow, so the plugin needs to take care of this.

@stain
Copy link

stain commented Mar 28, 2024

Hi, let us know if you would like some help looking at this.

@bentsherman
Copy link
Member Author

Thank you guys for your feedback. It's exactly what I needed to make sure I'm going in the right direction

I thought I was going to get to this sooner which is why I didn't respond at the time, but that never happened. Sorry for the radio silence

I've been too busy with other priorities to put any time into this, so this will likely not move until I get some free time or someone else picks it up. If you know anyone who would like to work on it, I would be happy to work with them

I believe there is also a parallel effort to implement the workflow run crate in the nf-core tooling, might be worth checking in on them

@stain
Copy link

stain commented May 23, 2024

See also nf-core/tools#2680

@fbartusch may be able to have a look at this

@fbartusch
Copy link

fbartusch commented Jun 6, 2024

I worked on most issues @simleo pointed out.
@bentsherman How can I add my changes to this pull request? Can you somehow give me the permission to commit my changes to the PR?

@bentsherman
Copy link
Member Author

@fbartusch I suggest that you fork the repo, push a new branch with your changes, then you should be able to make a PR for it.

@fbartusch
Copy link

@fbartusch I suggest that you fork the repo, push a new branch with your changes, then you should be able to make a PR for it.

@bentsherman I created a PR: #33
@simleo: I think i fixed most of the issues you mentioned here: #19 (comment)). Can you check if information for a valid Workflow Run RO-crate is still missing?

@simleo
Copy link

simleo commented Jun 20, 2024

@fbartusch I've posted my comment on #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RO crate format
4 participants