-
Notifications
You must be signed in to change notification settings - Fork 5
Description
In its current state, the PWD also contains the inputs with concrete values, e.g., for the simple arithmetic workflow
{
"nodes": [
{"id": 0, "function": "workflow.get_prod_and_div"},
{"id": 1, "function": "workflow.get_sum"},
{"id": 2, "value": 1},
{"id": 3, "value": 2}
],
"edges": [
{"target": 0, "targetPort": "x", "source": 2, "sourcePort": null},
{"target": 0, "targetPort": "y", "source": 3, "sourcePort": null},
{"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
{"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"}
]
}
We see this as problematic for a few reasons: On one hand, it is somewhat inconsistent, as it means that data nodes should explicitly be part of the workflow definition. However, neither the result
output is part of the PWD graph representation as data nodes nor are the intermediate values, e.g., prod
and div
. Instead, in the current representation, the prod
and div
outputs are only represented by the edges. If some data objects are part of the PWD, then all of them should be.
Further, providing concrete input values means that the definition contained in the JSON PWD corresponds to a concrete workflow "instance" rather than just the general workflow logic. I recall that this was also brought up in one of the comments on the paper draft, which mentioned that we should showcase that it’s possible to modify the input values after loading a workflow into a framework from the PWD.
Therefore, we propose to remove the data nodes from the "nodes" section of the PWD JSON and instead, to still keep the relevant information, to add it to the "edges" section. This would give the following PWD:
{
"nodes": [
{"id": 0, "function": "workflow.get_prod_and_div"},
{"id": 1, "function": "workflow.get_sum"}
],
"edges": [
{"target": 0, "targetPort": "x", "source": null, "sourcePort": null},
{"target": 0, "targetPort": "y", "source": null, "sourcePort": null},
{"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
{"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
{"target": null, "targetPort": null, "source": 1, "sourcePort": "result"}
]
}
That means that the "global" inputs and outputs of the workflow are represented by "dangling" edges of the workflow graph. While it's not ideal, we think this is fine for now to keep the modifications minimal. Thus, with this modification, any unused return values of intermediate functions would automatically become global outputs of the workflow. One could instead add additional keys to the edges such as workflow_input_name
and workflow_output_name
to explicitly expose those, e.g.:
{
"nodes": [
{"id": 0, "function": "workflow.get_prod_and_div"},
{"id": 1, "function": "workflow.get_sum"}
],
"edges": [
{"target": 0, "targetPort": "x", "source": null, "sourcePort": null, "workflowInputName": "a"},
{"target": 0, "targetPort": "y", "source": null, "sourcePort": null, "workflowInputName": "b"},
{"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
{"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
{"target": null, "targetPort": null, "source": 1, "sourcePort": "result", "workflowOutputName": "c"}
]
}
Or add a global_ports
section to the PWD. To be thought about in the future.
Ping @mbercx and @giovannipizzi.