Skip to content

Output nodes, global parameter names, and default port handling in PWD format #88

@GeigerJ2

Description

@GeigerJ2

In its current state, the PWD also contains the inputs with concrete values, e.g., for the simple arithmetic workflow

{
  "nodes": [
    {"id": 0, "function": "workflow.get_prod_and_div"},
    {"id": 1, "function": "workflow.get_sum"},
    {"id": 2, "value": 1},
    {"id": 3, "value": 2}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": 2, "sourcePort": null},
    {"target": 0, "targetPort": "y", "source": 3, "sourcePort": null},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"}
  ]
}

We see this as problematic for a few reasons: On one hand, it is somewhat inconsistent, as it means that data nodes should explicitly be part of the workflow definition. However, neither the result output is part of the PWD graph representation as data nodes nor are the intermediate values, e.g., prod and div. Instead, in the current representation, the prod and div outputs are only represented by the edges. If some data objects are part of the PWD, then all of them should be.

Further, providing concrete input values means that the definition contained in the JSON PWD corresponds to a concrete workflow "instance" rather than just the general workflow logic. I recall that this was also brought up in one of the comments on the paper draft, which mentioned that we should showcase that it’s possible to modify the input values after loading a workflow into a framework from the PWD.

Therefore, we propose to remove the data nodes from the "nodes" section of the PWD JSON and instead, to still keep the relevant information, to add it to the "edges" section. This would give the following PWD:

{
  "nodes": [
    {"id": 0, "function": "workflow.get_prod_and_div"},
    {"id": 1, "function": "workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result"}
  ]
}

That means that the "global" inputs and outputs of the workflow are represented by "dangling" edges of the workflow graph. While it's not ideal, we think this is fine for now to keep the modifications minimal. Thus, with this modification, any unused return values of intermediate functions would automatically become global outputs of the workflow. One could instead add additional keys to the edges such as workflow_input_name and workflow_output_name to explicitly expose those, e.g.:

{
  "nodes": [
    {"id": 0, "function": "workflow.get_prod_and_div"},
    {"id": 1, "function": "workflow.get_sum"}
  ],
  "edges": [
    {"target": 0, "targetPort": "x", "source": null, "sourcePort": null, "workflowInputName": "a"},
    {"target": 0, "targetPort": "y", "source": null, "sourcePort": null, "workflowInputName": "b"},
    {"target": 1, "targetPort": "x", "source": 0, "sourcePort": "prod"},
    {"target": 1, "targetPort": "y", "source": 0, "sourcePort": "div"},
    {"target": null, "targetPort": null, "source": 1, "sourcePort": "result", "workflowOutputName": "c"}
  ]
}

Or add a global_ports section to the PWD. To be thought about in the future.
Ping @mbercx and @giovannipizzi.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions