Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully replace WorkflowTask.args, in patch-WorkflowTask endpoint #758

Closed
tcompa opened this issue Jun 20, 2023 · 13 comments · Fixed by #759
Closed

Fully replace WorkflowTask.args, in patch-WorkflowTask endpoint #758

tcompa opened this issue Jun 20, 2023 · 13 comments · Fixed by #759
Labels
High Priority Current Priorities & Blocking Issues

Comments

@tcompa
Copy link
Collaborator

tcompa commented Jun 20, 2023

Current status

The patch-workflowtask endpoint only updates the args attribute in an incremental way, see

Proposed update

When the patch-workflowtask endpoint receives a args attribute, this one should fully replace the existing workflowtask.args value (rather than updating it). This will provide more flexibility to fractal-web, and unlock fractal-analytics-platform/fractal-web#196 (@rkpasia).
(In passing, it will also contribute to #5)

Use cases

Let's consider this task function

def function(x: int, y: str = "default_y"):
    pass

Custom task

  • The user adds a task to fractal-server by hand, and they do not create/add a JSON Schema for arguments.
  • The default_y string does not appear anywhere in the database, when the Task is imported as a WorkflowTask.
  • The user may modify WorkflowTask.args at will. If x is missing, the task will fail. If y is missing, the task will use "default_y". If y is set to "some_y", the task will use "some_y".

Common task

  • The user includes a JSON Schema for the task.
  • When the Task is imported as a WorkflowTask, WorkflowTask.args is set to {"y": "default_y"}.
  • The user may modify args["y"] and set it to "some_y", and call the patch-workflowtask endpoint (note: they should also include a value for x). The task will then run with "some_y".
  • The user may remove the "y" entry from args through the CLI client. The task will then run with the python-function default value "default_y".
  • If "y" is not present in WorkflowTask.args, then upon loading the WorkflowTask in fractal-web this parameter will be automatically filled with the default value. This means that after any interaction with fractal-web the y parameter will be either set to a custom value or to its default value.

A (uncommon) use case that would not be supported any more

The following use case currently would work, but it won't work any more if we move as described in this issue:

  • I create a custom task, with the CLI or web client, with/without a schema
  • I import it as a WorkflowTask, via the CLI client
  • I modify its x parameter via the CLI client
  • I modify its y parameter via the CLI client (with a new client call, i.e. with a new API call)
  • Both x and y are now up-to-date in the DB.

I don't think anyone was relying on this incremental-update behavior of the patch-WorkflowTask endpoint - right @jluethi? [or, to be more explicit, I don't think the patch-WorkflowTask endpoint is used often in the CLI client]. And I'm not fully sure of why we implemented this behavior in the first place, since ex-post it does not seem very intuitive.

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

I don't think anyone was relying on this incremental-update behavior of the patch-WorkflowTask endpoint - right @jluethi?

Hmm, agreed, it doesn't break old behaviors (pre 1.3.0). It's very optimized for Fractal web and removes some improvements in 1.3.0 though of having the workflow json / workflow task contain all the ground-truth of all parameters that would be run if a user uses the CLI client though (see below for explanation).

The user may remove the "y" entry from args through the CLI client. The task will then run with the python-function default value "default_y".
If "y" is not present in WorkflowTask.args, then upon loading the WorkflowTask in fractal-web this parameter will be automatically filled with the default value. This means that after any interaction with fractal-web the y parameter will be either set to a custom value or to its default value.

Hmm, that behavior does seem a bit tricky, but I guess it wont be an issue on the fractal web side. At first, it seems a bit weird that a user would "unset" a parameter (e.g. remove it from the parameter list or set it to None), but the server then just uses the default value for it, without the user being able to see what that default value is.
In the web client, we don't allow removing of arguments in a schema though, right? And the schema validation would stop a user from setting "y" = None, right? => should be safe in the web client.

It's less optimal on the CLI side: If the cellpose task has 20 parameters with default values, upon initially adding the task, all those default values would now get set in the workflow task, right (improvement of 1.3.0 fractal server)? But if a user then provides just 3 new values or so (e.g. level and model_type + channels, see e.g. example 01), then download the workflow, the workflow tasks now only contain the newly set parameters, not all of the defaults. So if I look at the workflow, all the defaults appear to be unset and just use some default values the user can't see.
What happens if they then open that workflow in Fractal web, where all the defaults have been unset? Fractal web with schemas still needs to show all the values. Does it then just show the defaults again for each value?

Big picture, this potentially moves ground-truth away from the workflow task & the workflow json export (in the CLI case), which I'm not the biggest fan of. It is similar to the pre 1.3.0 state though, so I could live with that change for the time being and we can consider an update to the CLI API later. The web should be the main way of building workflows and there it's fine.

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

tldr: I think this new API makes a lot of sense for fractal web and is what it would probably always want to use. It's also compatible with 1.2.0 CLI behavior.
So far, 1.3.0 did bring an improvement to CLI client behavior though, making the workflow a more reliable source of truth. This shouldn't influence reproducibility for packaged tasks (maybe slightly for custom tasks when used in non-standard ways), but will limit the transparency of what's happening.
Because that would be a new behavior of improving the CLI client for 1.3.0 and the CLI client has less priority than the web, I'm ok with postponing this feature if necessary. But I think we should then open a new issue to eventually add support for this again

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

I generally agree with these comments. It would be indeed a change which is optimized for fractal-web, where you wouldn't remove an argument by accident (that is, just because you did not include it in a API call). Note that this holds both with/without schemas, because also when you have no schema fractal-web always shows the current values and you would need to explicitly remove them if you want to.

Why we need to change the PATCH endpoint

However, the purely "incremental" behavior of the PATCH endpoint has to change, IMO. Main reasons are not even related to default values:

  1. I generally don't find it intuitive, but that could be a personal opinion
  2. It leaves Deleting Workflow Task properties after adding them #629 open, which is essentially a bug
  3. It's extremely hard to guess what it does with nested objects. When the current args["allowed_channels"] has one element, and I send a payload where args["allowed_channels"] includes a different element, what happens? Do they get merged, or do we replace the old value with the new one? (I think we currently do replace the old value with the old one, but it's up to the dict.update() method - see also make dictionary updating safe #5).

At least for these reasons, I think we have to move forward.

Handling default values

A different question concerns the defaults and our procedure for setting them. Let's consider the two cases without/with schemas.

Task without schema

If we have no JSON Schemas, then there is not much to say. The defaults do not exist in fractal-server or in the DB or in fractal-web, but they only exist in the task Python function.

Task with schema

In this case we already have multiple sources of truth: the function and the JSON schema (when present). I think we should work under the assumption that the JSON Schema was produced correctly, and that it is not something we want to update often (it could happen, during development, but then one should make sure to review it by hand).
However, to try and combine the multiple sources into a single one, in 1.3 we introduced the following feature: "when a Task is imported as a WorkflowTask, the default values are read from the schema and used to pre-populate WorkflowTask.args". This, as @jluethi correctly stresses, means that with the current version all arguments will be present in WorkflowTask.args at the execution time (the user must set the required ones, and the optional ones are already present anyway).

New proposal

My goal is to:

  • Remove the incremental-update PATCH behavior
  • Preserve the "args has all arguments at execution time" feature

Here is a new proposal for the PATCH-workflowtask endpoint which may work:

  • We receive a new object args_new from the CLI/web client
  • We set WorkflowTask.args = args_new (full replacement)
  • If there is a schema, we scan its properties and for all properties that are not set in args and do have a default in the schema, we set their value to the default

This has some similarity with old versions (1.2), where we would dynamically merge the Task.default_args (now replaced by the schema) with the WorkflowTask.args, with an important difference. In 1.2, this merge would happen during execution of the workflow, while in 1.3 this merge only happens as part of the POST/PATCH-workflowtask endpoints and is written in the DB. To me, this looks like an important improvement in terms of making it clear what the server will use as task parameters.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

It's less optimal on the CLI side: If the cellpose task has 20 parameters with default values, upon initially adding the task, all those default values would now get set in the workflow task, right (improvement of 1.3.0 fractal server)? But if a user then provides just 3 new values or so (e.g. level and model_type + channels, see e.g. example 01), then download the workflow, the workflow tasks now only contain the newly set parameters, not all of the defaults. So if I look at the workflow, all the defaults appear to be unset and just use some default values the user can't see.
What happens if they then open that workflow in Fractal web, where all the defaults have been unset? Fractal web with schemas still needs to show all the values. Does it then just show the defaults again for each value?

I think this use case would be covered by the new proposal, right @jluethi?

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

Task without schema

If we have no JSON Schemas, then there is not much to say. The defaults do not exist in fractal-server or in the DB or in fractal-web, but they only exist in the task Python function.

Agreed!

In 1.2, this merge would happen during execution of the workflow, while in 1.3 this merge only happens as part of the POST/PATCH-workflowtask endpoints and is written in the DB. To me, this looks like an important improvement in terms of making it clear what the server will use as task parameters.

Yes, great summary! That's what I really like about the 1.3.0 improvements as we had them for the CLI so far, merging upon Post/PATCH, not execution!

If there is a schema, we scan its properties and for all properties that are not set in args and do have a default in the schema, we set their value to the default

Ah, that's a very nice way of maintaining this new functionality! How tricky is this feature? I'm generally a big fan of this way. It wouldn't be a hard requirement for 1.3.0 to have this, in case there are edge cases we need to consider here. But if it's straightforward, then this allows us to go to the new args replacement while maintaining the merging at POST/PATCH, not at execution.

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

Quickly thinking through some implications of the "If there is a schema, we scan its properties and for all properties that are not set in args and do have a default in the schema, we set their value to the default" workflow:
From Fractal web, the user should never be able to remove a parameter when there is a schema and they can't set it to None (would fail during web client side validation). Thus, this should be transparent to how Fractal web uses this.

From the command line side, this seems to solve my issue I raised above, because it "reintroduces" the benefits of incremental updates through using the schema.

Conclusion: I don't see issues with this approach that I can think of :)

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

I think this use case would be covered by the new proposal, right @jluethi?

Correct, the new proposal would handle this well!

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

How tricky is this feature?

It should be straightforward. We already do this in the POST endpoint, and we need to do the same in the PATCH endpoint, that's it.
The current code (which can be cleaned up a bit) looks like

        db_task = await db.get(Task, task_id)
        default_args = {}
        if db_task.args_schema is not None:
            try:
                properties = db_task.args_schema["properties"]
                for prop_name, prop_schema in properties.items():
                    default_value = prop_schema.get("default", None)
                    if default_value:
                        default_args[prop_name] = default_value
            except KeyError as e:
                logging.warning(
                    "Cannot set default_args from args_schema="
                    f"{json.dumps(db_task.args_schema)}\n"
                    f"Original KeyError: {str(e)}"
                )
        # Override default_args with args
        actual_args = default_args.copy()
        if args is not None:
            for k, v in args.items():
                actual_args[k] = v
        if not actual_args:
            actual_args = None

You can immediately see that we are essentially re-building the Task.default_args attribute that we removed, because that now has to come from the schema. This is a minor performance issue, which make sense for what we are doing. We could do it only upon Task insertion, in principle, but then we would always need to check that the task schema did not change -> not worth it.

In the end we are really doing the same thing as in 1.2, in that we first set the default values and then override them with the user-provided values. The big change in upcoming 1.3.0, with respect to 1.2.0, will be that we do this only when acting on the DB (through the POST/PATCH endpoints), and then we store the output of this merge in the DB itself.
Another (minor) change is that we replace the dict.update method with a more explicit loop over items, to make it more transparent what the code does.

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

From Fractal web, the user should never be able to remove a parameter when there is a schema

Agreed

and they can't set it to None (would fail during web client side validation)

They can unset an optional argument x.
This leads to the following implications:

  • fractal-web sets x to null
  • upon "save", fractal-web strips the null argument x from the payload that will go to fractal-server (before the ajv validation, which would otherwise fail)
  • fractal-server receives a set of new arguments without x
  • fractal-server pre-populates args with the default value of x
  • fractal-server has no new value for x, so the default value remains in args

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

From Fractal web, the user should never be able to remove a parameter when there is a schema

Agreed

and they can't set it to None (would fail during web client side validation)

They can unset an optional argument x. This leads to the following implications:

* fractal-web sets `x` to null

* upon "save", fractal-web strips the null argument `x` from the payload that will go to fractal-server (before the ajv validation, which would otherwise fail)

* fractal-server receives a set of new arguments without `x`

* fractal-server pre-populates `args` with the default value of `x`

* fractal-server has no new value for `x`, so the default value remains in `args`

This was just to say it out loud, not to say that anything is wrong. At the moment I'm not aware of any glitch in the new proposal.

@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

They can unset an optional argument x.
This leads to the following implications:

fractal-web sets x to null
upon "save", fractal-web strips the null argument x from the payload that will go to fractal-server (before the ajv validation, which would otherwise fail)
fractal-server receives a set of new arguments without x
fractal-server pre-populates args with the default value of x
fractal-server has no new value for x, so the default value remains in args

This sounds great!

The big change in upcoming 1.3.0, with respect to 1.2.0, will be that we do this only when acting on the DB (through the POST/PATCH endpoints), and then we store the output of this merge in the DB itself.

Sounds great!

=> Your new proposal sounds good, let's go with that!

@tcompa tcompa closed this as completed in 4ebdb59 Jun 21, 2023
tcompa added a commit that referenced this issue Jun 21, 2023
…lace-workflowtaskargs-in-patch-workflowtask-endpoint

Fully replace WorkflowTask.args, in patch endpoint (close #758)
@jluethi
Copy link
Collaborator

jluethi commented Jun 21, 2023

That was fast!
👏🏻👏🏻👏🏻

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 21, 2023

It was already set up, I just needed some minor updates and tests ;)
Now released in 1.3.0a9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Current Priorities & Blocking Issues
Projects
None yet
2 participants