-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added initial WfExS-backend examples based on toy workflow. #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start from cosifer-cwl_provenance
.
-
The root dataset's
conformsTo
needs to change to that of Workflow Run Crate (i.e., it should not conform tohttps://w3id.org/ro/wfrun/provenance/0.1
, but only to the other three profiles), since the tool execution is not recorded. The recorded actions are the "consolidation" (what is this, by the way?) and three executions of the workflow. -
The crate needs to specify a
license
. -
The root dataset needs to link to the
CreateAction
instances viamentions
. -
There are duplicate entries in the workflows'
input
andoutput
that need to be eliminated. This is not handled by ro-crate-py, so it has to be taken care of in the library user's code. Other lists in the crate also have duplicates, e.g. the workflow'ssoftwareRequirements
. BTW,softwareRequirements
is a property ofSoftwareApplication
, soSoftwareApplication
should be added to the workflow's type list. -
The workflow lists itself in
hasPart
. It should only list the tools (only one in this case). However, since it's a Workflow Run Crate,hasPart
can be omitted altogether. -
The additionalType for CWL string parameters should be
Text
, notString
. See https://www.researchobject.org/workflow-run-crate/cwl_param_mapping -
Input and output files should either be included in the crate or be retrievable via download. So data_matrix.csv is ok (although it's better to include it in the crate), but the
nih:
URIs are another matter. What should the crate user do with e.g.nih:sha-256;95fe-414b-b4e0-e193-6230-ea67-cd55-4ca8-cf30-6281-0870-0748-5d76-2df3-67b7-8bf4;9
? How can the file be retrieved?
cosifer-cwl_staged
is similar to cosifer-cwl_provenance
, but does not have the workflow executions. Remarks are similar to the ones above, with the following differences:
-
Since the only action is the consolidation, which is not a workflow execution, this crate should conform to Process Run Crate and Workflow RO-Crate, but not Workflow Run Crate or Provenance Run Crate.
-
The workflow has
input
but notoutput
. They could both be omitted though.
The cosifer-nxf*
crates have issues similar to the ones above. In addition:
-
The workflow's
input
lists both anoutputsDir
and anoutputs-dir
parameter, but only the former is present in the workflow. -
The workflow lists
nextflow.config
inhasPart
, buthasPart
should only list the tools orchestrated by the workflow.nextflow.config
should be listed in the action's "object" (see referencing configuration files).hasPart
in the workflow can be omitted altogether.
The workflow run crates will have to |
49dbf69
to
5a11d94
Compare
I have been updating previous examples, as well as adding new, real life, ones. New examples in this repo only include the generated ro-crate-metadata.json and either a copy of the workflow or its packed version, as the used inputs and containers need several GBs. |
Thanks for the updates, José María. Looking at the previous examples, progress has been made. However, some issues remain, and the new workflows bring some other issues. I'll list what I've found below. All (or most) crates
"@context": [
"https://w3id.org/ro/crate/1.1/context",
"https://w3id.org/ro/terms/workflow-run"
]
|
I have revised the main issues. I still have to revise the other ones, and figure out the best way to add and semantically relate a README file describing the meaning and usage of the different schemes to the generated RO-Crates |
The biggest issue with the recent changes is that Other issues I've found with the latest changes:
|
I understand it, I have applied the change. The reason I added multiple instruments to the CreateAction is that I realized a workflow can be run in different containerization modes, so fully describing the workflow execution (not the workflow itself) requires either adding all the software requirements under So, my conclusion is that the containers and the container engine should not appear under softwareRequirements of the computational workflow as such, as they could not appear in a different execution. So, in the mean time, I'm declaring them under softwareAddOn. So, in the long term, I think we should distinguish between the workflow as SoftwareSourceCode (the targetPlatform is the workflow engine) and the instantiated workflow as SoftwareApplication (the softwareRequirements are the workflow engine, the containers, etc...). What do you think?
Thanks! I didn't realize I put the relation in the wrong way.
These JSON are metadata gathered by WfExS-backend, and depending on whether Singularity/Apptainer or Docker/Podman have one or another format. They help WfExS-backend to identify when the original container and the contents from the cache (or RO-Crate) do not match. I have just added a couple of paragraphs to the automatically included README.md files, giving some details. Following with your last questions, when docker or podman modes have been used to run the workflow, the metadata of the images can be obtained using When Singularity/Apptainer mode is used for the workflow execution, container images can come from either http requests or from docker registries. Any of them are materialized by So, although part from those JSON files contain original metadata, they are augmented with additional details. |
I think distinguishing between "just code" and actual running application would be very hard at this point, since the whole model (and the tooling) is based on code/application being on the prospective part and actions on the retrospective part. Even at the lowest level of Process Run Crate it says that the application's type should include SoftwareApplication, SoftwareSourceCode or ComputationalWorkflow. The main problem with {
"@id": "#e0d55b35-b042-420e-8cf3-c8424644f17b",
"@type": "CreateAction",
"containerImage": "#cosifer-image"
},
{
"@id": "#cosifer-image",
"@type": "ContainerImage",
"additionalType": "DockerImage",
"registry": "docker.io",
"name": "tsenit/cosifer",
"tag": "b4d5af45d2fc54b6bff2a9153a8e9054e560302e"
} We could add {
"@id": "#e0d55b35-b042-420e-8cf3-c8424644f17b",
"@type": "CreateAction",
"containerImage": "docker://tsenit/cosifer:b4d5af45d2fc54b6bff2a9153a8e9054e560302e"
} Regarding the JSON files about the container images, the container images should not refer to them via |
…l diclose the full path
…s" who ran the workflow.
cd6a83f
to
547a5f1
Compare
I have updated all the examples, so they are now using |
Looking good for the most part. In Other remarks (for all crates):
{
"@id": "docker://docker.io/node:slim",
"@type": [
"ContainerImage",
"SoftwareApplication"
],
"additionalType": "DockerImage",
"applicationCategory": "https://www.wikidata.org/wiki/Q51294208",
"name": "docker.io/node",
"operatingSystem": "linux",
"processorRequirements": "amd64",
"registry": "docker.io",
"softwareRequirements": {
"@id": "https://apptainer.org/"
},
"softwareVersion": "library/node@sha256:dc1906714d1993d291e1e7b5f236291236b0a0b6dfacdb164e4a9ea44d09c52e",
"tag": "slim",
"sha256": "dc1906714d1993d291e1e7b5f236291236b0a0b6dfacdb164e4a9ea44d09c52e"
} |
Version 0.3 of the profiles, which include the specs on container images, has been released, so you can change the ro-crate-py 0.9.0 has also been released and includes ResearchObject/ro-crate-py#162, so you can use it to add the whole |
Nice! I'm doing it along these days! |
Merging to give a home to the examples. Remaining updates can be done in a subsequent PR |
I have just added the generated RO-Crates from the execution of two toy workflows with WfExS-backend.