Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reliance on key order in "do" #875

Closed
matthias-pichler opened this issue Jun 3, 2024 · 93 comments · Fixed by #882
Closed

Remove reliance on key order in "do" #875

matthias-pichler opened this issue Jun 3, 2024 · 93 comments · Fixed by #882
Labels
area: spec Changes in the Specification
Milestone

Comments

@matthias-pichler
Copy link
Collaborator

matthias-pichler commented Jun 3, 2024

What would you like to be added:

I propose to:

  1. only use the continue FlowDirective to continue a loop
  2. make then required in all tasks
  3. add a start field to the workflow that denotes the first task to execute
document:
  dsl: '1.0.0-alpha1'
  namespace: test
  name: order-pet
  version: '1.0.0'
  title: Order Pet - 1.0.0
  summary: >
  # Order Pet - 1.0.0
  ## Table of Contents
  - [Description](#description)
  - [Requirements](#requirements)
  ## Description
  A sample workflow used to process an hypothetic pet order using the [PetStore API](https://petstore.swagger.io/)
  ## Requirements
  ### Secrets
   - my-oauth2-secret
use:
  authentications:
    petStoreOAuth2:
      oauth2: my-oauth2-secret
  functions:
    getAvailablePets:
      call: openapi
      with:
        document:
          uri: https://petstore.swagger.io/v2/swagger.json
        operation: findByStatus
        parameters:
          status: available
  secrets:
  - my-oauth2-secret
start: getAvailablePets
do:
  getAvailablePets:
    call: getAvailablePets
    output:
      from: "$input + { availablePets: [.[] | select(.category.name == "dog" and (.tags[] | .breed == $input.order.breed))] }"
    then: submitMatchesByMail
  submitMatchesByMail:
    call: http
    with:
      method: post
      endpoint:
        uri: https://fake.smtp.service.com/email/send
        authentication: petStoreOAuth2
      body:
        from: noreply@fake.petstore.com
        to: ${ .order.client.email }
        subject: Candidates for Adoption
        body: >
        Hello ${ .order.client.preferredDisplayName }!

        Following your interest to adopt a dog, here is a list of candidates that you might be interested in:

        ${ .pets | map("-\(.name)") | join("\n") }

        Please do not hesistate to contact us at info@fake.petstore.com if your have questions.

        Hope to hear from you soon!

        ----------------------------------------------------------------------------------------------
        DO NOT REPLY
        ----------------------------------------------------------------------------------------------
    then: end

Why is this needed:

Currently the task execution order is determined by the order in which the tasks appear in the "do" field.
This is very convenient for writing small workflows.
I however see two mayor drawbacks with this approach:

  1. similar problems as with a "switch fallthrough". If one forgets a "then" field the execution might resume at a task that is different from the one I expected.

Another interesting case is that the then field is allowed top level in switch tasks. So The way I understand it, this would execute the processElectronicOrder task when the default case is reached because it sets a continue clause.

document:
  dsl: '1.0.0-alpha1'
  namespace: test
  name: sample-workflow
  version: '0.1.0'
do:
  processOrder:
    switch:
      case1:
        when: .orderType == "electronic"
        then: processElectronicOrder
      case2:
        when: .orderType == "physical"
        then: processPhysicalOrder
      default:
        then: continue
  processElectronicOrder:
    execute:
      sequentially:
        validatePayment: {...}
        fulfillOrder: {...}
    then: exit
  processPhysicalOrder:
    execute:
      sequentially:
        checkInventory: {...}
        packItems: {...}
        scheduleShipping: {...}
    then: exit
  handleUnknownOrderType:
    execute:
      sequentially:
        logWarning: {...}
        notifyAdmin: {...}
  1. reliance on object keys forces implementers to alway process/store definitions as text blobs or strings.
    Some problems this causes:
  • One cannot store the definition as a JSON object in SQL or NoSQL databases as they usually do not preserve Object key order. Thus one cannot use built-in filtering in databases
  • When building an API to create workflows the definition has to be accepted as a string or file blob then parsed and validated with the JSON schema and then the parsed object cannot be used because key order is not preserved. This might look something like this in TypeScript:
function parse(input: string): IDefinition {
  const obj: Workflow = JSON.parse(input);

  validate(obj);

  // parse document as map
  const map: Map<"do", Map<string, Task>> = yaml.parse(input, {
    mapAsMap: true,
  });
  // get task names in order of appearance
  const taskNames = Array.from(map.get("do")?.keys() ?? []);
  const tasks = new Map<string, Task>();
  taskNames.forEach((name) => {
    // get task as object instead of Map
    const task = obj.do[name];
    if (task) {
      tasks.set(name, task);
    }
  });

  return { ...obj, do: tasks, text: input };
}

and this only works because Map keys preserve order of insertion, assuming that yaml never changes the implementation that would change the insertion order.

  1. when providing a UI to build/visualize workflows the logic to create a valid and correct workflow definition might become incredible complex because on has to be precise about the ordering of keys, handle exit and fallthrough cases etc...
@JBBianchi
Copy link
Member

Note that continue might still be useful in loops (e.g.: continue triggers the next iteration, exit exists the loops, end ends the workflow from within the loop).

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Jun 3, 2024

Note that continue might still be useful in loops (e.g.: continue triggers the next iteration, exit exists the loops, end ends the workflow from within the loop).

Thanks, good to know 👍 I will updated the initial comment.

On another note: could there be value in renaming exit to break since this more closely aligns to what programmers are already used to? Because I always confuse end and exit and thinking of exiting a process process.exit(0) makes exit even harder to remember correctly.

@JBBianchi
Copy link
Member

Note that continue might still be useful in loops (e.g.: continue triggers the next iteration, exit exists the loops, end ends the workflow from within the loop).

Thanks, good to know 👍 I will updated the initial comment.

On another note: could there be value in renaming exit to break since this more closely aligns to what programmers are already used to? Because I always confuse end and exit and thinking of exiting a process process.exit(0) makes exit even harder to remember correctly.

I understand your point but I have 2 remarks:

  • supposedly, the DSL should be accessible to non-developers (but, ofc, devs will probably be the main target)
  • break kinda makes sense in the context of a loop, but does it really work for a branch ? We went for exit because it felt more natural to "exit" the branch than "breaking" it. It's still very opinionated and subject to discussions.

@matthias-pichler
Copy link
Collaborator Author

Note that continue might still be useful in loops (e.g.: continue triggers the next iteration, exit exists the loops, end ends the workflow from within the loop).

Thanks, good to know 👍 I will updated the initial comment.
On another note: could there be value in renaming exit to break since this more closely aligns to what programmers are already used to? Because I always confuse end and exit and thinking of exiting a process process.exit(0) makes exit even harder to remember correctly.

I understand your point but I have 2 remarks:

  • supposedly, the DSL should be accessible to non-developers (but, ofc, devs will probably be the main target)
  • break kinda makes sense in the context of a loop, but does it really work for a branch ? We went for exit because it felt more natural to "exit" the branch than "breaking" it. It's still very opinionated and subject to discussions.

I completely understand. I am also unsure about that. Exiting a branch/loop is also how I've come to remember it now.
Interestingly exit and break seem to align 1 to 1 (loops & switch statements).

But yeah lot of overloaded terms in general.

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

I think one of the coolest features of the new DSL is the possibiiltiy to write sequential workflows with very few words. See also this somehow related dicussion
If we remove that reliance in order I feel we were losing a pretty nice feature. Of the three issues, two of them (sorting and UI) are related with implementation difficulties that can be addressed.
This leave us with the the switch, which maybe can be adressed by changing it, so once you start a switch, everything has to be nested inside.

@JBBianchi
Copy link
Member

A possibility would be to add a parameter in the document object, like executionSequence with values like strict or loose. Loose would be the default, current behavior, and strict would require to provide a start and mandatory then.

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Jun 3, 2024

I think one of the coolest features of the new DSL is the possibiiltiy to write sequential workflows with very few words. See also this somehow related dicussion If we remove that reliance in order I feel we were losing a pretty nice feature.

I agree that for purely sequential workflows this is a very nice feature. However I personally do not think it is worth the footgun. I have two scenarios in mind here:

  1. having worked with many workflow languages (Pipedream, Twilio Workflows, AWS Step Functions, Zapier, Cadence, Temporal and having implemented a custom workflow DSL before) the number of times a purely sequential workflow was enough is near zero. (A single API call would already require to handle a 404). In many ways the value of a workflow engine is to declaratively define complex workflows. Some might have started out purely sequentially but very few stayed that way, which leads me to my seconds point:

  2. evolving sequential flows:

Assume I have a sequential workflow that doesStuff:

do:
  doStuff1: {}
  doStuff2: {}

when I run into issues where a even number is required I might add a check

do:
  checkParity:
    isOdd:
      when: ${ .x % 2 == 1 }
      then: double
    default:
      then: continue
  double:
    set:
      x: ${ x * 2}
  doStuff1: {}
  doStuff2: {}

I would immediately introduce a bug because x is now always doubled. And I would scratch my head about where I should even place double for it to work.

I believe it would be easier to update a single reference to point to the newly added task than to go around and suddenly have to add then-references everywhere. Having a required then field would allow me to add tasks anywhere in the execution graph by simply changing a single existing then property to point to the new subgraph.

Of the three issues, two of them (sorting and UI) are related with implementation difficulties that can be addressed.

Both the JSON spec as well as the YAML spec define maps/objects as "unordered". So as long as YAML and/or JSON is chosen as the definition language it seems wrong to me to go against these standards.
I believe this makes it incredibly hard for any implementor since basically no existing parsing library can be used. It would require everyone to implement a custom yaml/json parser.

This leave us with the the switch, which maybe can be adressed by changing it, so once you start a switch, everything has to be nested inside.

I think this would be even worse. Because then there would be no way for me to merge multiple branches of a switch task. Then I'd have to either copy and paste all tasks which makes it very verbose and unmaintainable or I'd have to create a completely separate workflow that all branches invoke.

@cdavernas
Copy link
Member

cdavernas commented Jun 3, 2024

I would immediately introduce a bug because x is now always doubled. And I would scratch my head about where I should even place double for it to work.

No, this is not the intended behavior. The next of the switch task is only performed if no default case has been supplied. Your example would therefore work as expected.

@matthias-pichler
Copy link
Collaborator Author

I would immediately introduce a bug because x is now always doubled. And I would scratch my head about where I should even place double for it to work.

No, this is not the intended behavior. The next of the switch task is only performed if no default case has been supplied. Your example would therefore work as expected.

but what about a default with then: continue?

@cdavernas
Copy link
Member

Having a required then field would allow me to add tasks anywhere in the execution graph by simply changing a single existing then property to point to the new subgraph.

That is true in theory, but in practice, I never used a serializer implementation which did not preserve the definition order. Plus, that would then be theoretically be true for lists, which should then be indexed thanks to some sort of additional property.

@cdavernas
Copy link
Member

but what about a default with then: continue?

Then it proceeds to "double", which is the next task in line.

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Jun 3, 2024

Having a required then field would allow me to add tasks anywhere in the execution graph by simply changing a single existing then property to point to the new subgraph.

That is true in theory, but in practice, I never used a serializer implementation which did not preserve the definition order. Plus, that would then be theoretically be true for lists, which should then be indexed thanks to some sort of additional property.

Arrays are defined with order in both specs and parsers respect that.

examples of not preserving order:

JSON.parse

> JSON.parse('{"3+": 1, "2": 2}')
{ '2': 2, '3+': 1 }

https://www.npmjs.com/package/yaml ... also not

@cdavernas
Copy link
Member

cdavernas commented Jun 3, 2024

Arrays are defined with order in both specs and parsers respect that.

I overlooked that.

JSON.parse('{"3+": 1, "2": 2}')

Damn.

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Jun 3, 2024

Arrays are defined with order in both specs and parsers respect that.

I overlooked that.

Refering to me previous comment, though, have you ever faced the case where a serializer did NOT preserve the order?

JavaScript/TypeScript

any parser in Javascript/Typescript that uses a Object representation for mappings (most do, some have support for a map)

while the traversal order of objects is defined no it is not the same as the insertion oder: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...in#:~:text=The%20traversal%20order,of%20property%20creation.

JSON.parse('{"3+": 1, "2": 2}')
// { '2': 2, '3+': 1 }

Python

pyyaml uses dict by default and one has to subclass with an OrderedDict

import yaml

from collections import OrderedDict

def represent_ordereddict(dumper, data):
    value = []

    for item_key, item_value in data.items():
        node_key = dumper.represent_data(item_key)
        node_value = dumper.represent_data(item_value)

        value.append((node_key, node_value))

    return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)

yaml.add_representer(OrderedDict, represent_ordereddict)

source: https://stackoverflow.com/questions/16782112/can-pyyaml-dump-dict-items-in-non-alphabetical-order

newer version of python (3.7) preserve insertion order in dict but that again depends on the implementation of the parser to insert in order 🤷

@cdavernas
Copy link
Member

cdavernas commented Jun 3, 2024

Well, if there's no way around it, I believe you have made your point. Unhappily. The js use case was enough to prove you right.

It will make authoring a flow a tad more tedious though 😞

Anyways, thanks a lot for the extremely useful insights!

@ricardozanini ricardozanini added the area: spec Changes in the Specification label Jun 3, 2024
@cdavernas
Copy link
Member

cdavernas commented Jun 3, 2024

An alternative would be to do what Google Workflows do, thus solving everyone's problems:

do:
- task1:
    call: http
- task2:
    call: http

We ensure proper ordering while keeping brievity.

@matthias-pichler
Copy link
Collaborator Author

An alternative would be to do what Google Workflows do, thus solving everyone's problems:

do:
- task1:
    call: http
- task2:
    call: http

We ensure proper ordering while keeping brievity.

That would work for me 👍 My goal really is to not have to bend over backwards to pass the definition as a string everywhere having to parse it every time and be able to send it via our API as JSON and store it in out DB as a JSON Blob instead of string

@cdavernas
Copy link
Member

That would work for me 👍

Great!

My goal really is to not have to bend over backwards

I know. And you saved us from a painful realisation in a near future. Thanks again!

Wanna take care of the PR to address it?

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

If we are going to use an array, I guess we might introduce names for task after all. isnt it?
See #849

@matthias-pichler
Copy link
Collaborator Author

If we are going to use an array, I guess we might introduce names for task after all. isnt it? See #849

that makes sense 👍

@cdavernas
Copy link
Member

If we are going to use an array, I guess we might introduce names for task after all. isnt it?
See #849

@fjtirado I don't know how others feel about it, but I'd prefer keeping the name in the path, therefore using the solution I proposed, like google does (otherwise, would be the same than going back to where we were).

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

@cdavernas I think we need to compare the resulting schemas.
If there is not name the object can be identified univocally by its position in the array.

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

Btw, when using jackson the map sorting is not an issue. It is using a linkedhashmap that preserve the order. Im saying that because we can still keep the map if we want. The concept of keeping order in map is not an alien thing (might not the default for python, but it does exist in Java and is the default for one of the most popular jackson deserializers)

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

But to be honest I prefer an array with an optional property name. And that the do schema, as proposed here, is the same both for main flow, loops and retries. If user want to use names for tasks, he can use it. If they do not want, they can also do it. One way or the other, when there is an error, the error task can be identified (by name or index)

@cdavernas
Copy link
Member

cdavernas commented Jun 3, 2024

@cdavernas I think we need to compare the resulting schemas.
If there is not name the object can be identified univocally by its position in the array.

Sure, whatever you guys want is fine. I'm just sad we have to go down that path again 😭

Btw, when using jackson the map sorting is not an issue. It is using a linkedhashmap that preserve the order. Im saying that because we can still keep the map if we want. The concept of keeping order in map is not an alien thing (might not the default for python, but it does exist in Java and is the default for one of the most popular jackson deserializers)

No, that's not acceptable to say that, because the libs we use work fine that way, we should discard the issue with others, specially when we are speaking of some of the most popular languages on earth, js and python. As much as I hate it, we are forced I believe to perform that change.

One way or the other, when there is an error, the error task can be identified (by name or index)

No. It would be identified only by index, regardless of whether the name has been set or not, otherwise JSON pointers using name would fail to evaluate.

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

I would say that neither js neither python are serious programming languages, but thats another discussion ;)

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 3, 2024

Once we renounce to the map thing because constraints on "popular" programming languages that are not really programming languages ;), I think we should take the the json schema that is easier to parse (in other words, the json schema with the minimun number of anyOf/oneOf constructs)
Thasts why feel that, to decide if we go with name as value of a property called name or name as propery itself, we should take a look to the resulting schemas.

@ricardozanini
Copy link
Member

oh, it's hard to follow all the discussions at once. Nice to see some momentum and engagement, thou!

I vote for the proposal that I can do do/name-of-task/... in JSON pointers/path. I'd hate to be forced to use indexes again to map objects within the DSL.

@cdavernas
Copy link
Member

Good point, but I think you'll agree that it would completly defeat the fluent and linguistic approach to the workflow; read out loud it won't mean anything anymore.

Whereas as just renaming /for/do seems the less painful.

@matthias-pichler
Copy link
Collaborator Author

@cdavernas Another possibility is to completely remove do: from everywhere and just use composition in any place where you need arrays of task (including main flow)

you mean to basically replace do with execute:

document:
  dsl: "1.0.0-alpha1"
  namespace: test
  name: execute-all-the-things
  version: "0.1.0"
use:
  extensions:
    - externalLogging:
        extend: all
        before:
          call: http
          with:
            method: post
            uri: https://fake.log.collector.com
            body:
              message: msg
        after:
          call: http
          with:
            method: post
            uri: https://fake.log.collector.com
            body:
              message: msg
execute:
  sequentially:
    - processOrder:
        switch:
          - case1:
              when: .orderType == "electronic"
              then: processElectronicOrder
          - case2:
              when: .orderType == "physical"
              then: processPhysicalOrder
          - default:
              then: handleUnknownOrderType
    - processElectronicOrder:
        execute:
          sequentially:
            - validatePayment:
                set:
                  validate: true
            - fulfillOrder:
                set:
                  status: fulfilled
          then: exit
    - processPhysicalOrder:
        execute:
          concurrently:
            - checkInventory:
                set:
                  inventory: clear
            - packItems:
                set:
                  items: 1
            - scheduleShipping:
                set:
                  address: Elmer St
          then: exit
    - handleUnknownOrderType:
        for:
          each: order
          in: .orders
          execute:
            sequentially:
              - logWarning:
                  set:
                    log: warn
              - notifyAdmin:
                  try:
                    execute:
                      sequentially:
                        - setMessage:
                            set:
                              message: something's wrong
                        - logMessage:
                            set:
                              message2: ${ .message }

@cdavernas
Copy link
Member

Note that do: is now used at main flow, in loops and catch

Right! So why not leaving do at top level to keep fluency, and remove it from, while and catch?

@cdavernas
Copy link
Member

you mean to basically replace do with execute:

Yeah, why not! That's also a pretty good idea. Even though it might be a tad more cumbersome when wanting only a single task.

@matthias-pichler
Copy link
Collaborator Author

you mean to basically replace do with execute:

Yeah, why not! That's also a pretty good idea. Even though it might be a tad more cumbersome when wanting only a single task.

true but do I really need a work flow for a single task? 😅

@cdavernas
Copy link
Member

true but do I really need a work flow for a single task? 😅

Workflow no. However try/for/... fosho!

@matthias-pichler
Copy link
Collaborator Author

matthias-pichler commented Jun 4, 2024

true but do I really need a work flow for a single task? 😅

Workflow no. However try/for/... fosho!

touché ... what if we introduced a shorthand to write execute.sequentially: map[string, task][] with execute: map[string, task][]

document:
  dsl: "1.0.0-alpha1"
  namespace: test
  name: execute-all-the-things
  version: "0.1.0"
use:
  extensions:
    - externalLogging:
        extend: all
        before:
          call: http
          with:
            method: post
            uri: https://fake.log.collector.com
            body:
              message: msg
        after:
          call: http
          with:
            method: post
            uri: https://fake.log.collector.com
            body:
              message: msg
execute:
  - processOrder:
      switch:
        - case1:
            when: .orderType == "electronic"
            then: processElectronicOrder
        - case2:
            when: .orderType == "physical"
            then: processPhysicalOrder
        - default:
            then: handleUnknownOrderType
  - processElectronicOrder:
      execute:
        sequentially:
          - validatePayment:
              set:
                validate: true
          - fulfillOrder:
              set:
                status: fulfilled
        then: exit
  - processPhysicalOrder:
      execute:
        concurrently:
          - checkInventory:
              set:
                inventory: clear
          - packItems:
              set:
                items: 1
          - scheduleShipping:
              set:
                address: Elmer St
        then: exit
  - handleUnknownOrderType:
      for:
        each: order
        in: .orders
        execute:
          sequentially: # <- this is also optional
            - logWarning:
                set:
                  log: warn
            - notifyAdmin:
                try:
                  execute:
                    sequentially:
                      - setMessage:
                          set:
                            message: something's wrong
                      - logMessage:
                          set:
                            message2: ${ .message }

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 4, 2024

@matthias-pichler-warrify @JBBianchi Just do that here #884 (comment)
I lile @JBBianchi proposal, which I think adress the logical root cause of the issue, so +1 for that.

@cdavernas
Copy link
Member

cdavernas commented Jun 4, 2024

touché ... what if we introduced a shorthand to write execute.sequentially: map[string, task][] with execute: map[string, task][]

I personally do not like it because of OneOfs. It's something that is a huge PITA for us OO devs (isn't it @fjtirado ?). I'd rather go with @JBBianchi's proposal in #884

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 4, 2024

so basically we do not have execute, we have do: for single task and concurrently or sequentially for multi task. That I think works for everyone. Lets see it reflected on the schema ;)

@matthias-pichler
Copy link
Collaborator Author

so basically we do not have execute, we have do: for single task and concurrently or sequentially for multi task. That I think works for everyone. Lets see it reflected on the schema ;)

perfect I'll get to it 👍

@JBBianchi
Copy link
Member

I think we overlooked the case of switch with the adopted solution. It's relegated to a second zone citizen, only available in execute:

do:
  execute:
    sequentially:
      - processOrder:
          switch:
          # ...

https://github.com/serverlessworkflow/specification/blob/main/dsl-reference.md#switch

Shouldn't we go back to do being a map[string, task][] instead of a single task ?

@cdavernas cdavernas reopened this Jun 10, 2024
@cdavernas
Copy link
Member

@JBBianchi Yes, indeed. Do as a mapping array is what we should opt for:

  • It resolves consistency of the keyword do: it's alway a mapping array of tasks
  • It allows for top-level switch statements
  • It's more consistent with what a workflow is, to most people: a successions of multiple tasks, not a single top level one, which is then... a task

@matthias-pichler
Copy link
Collaborator Author

That is true. And do as an array is absolutely fine by me, however as @cdavernas pointed out: what's the purpose of sequentially then?

Which given the discussion around concurrently and the execute keyword in general, might mean:

  • drop sequentially (not needed as we have arrays everywhere
  • only keep concurrently (or replace with branch to make it a verb for consistency)

@cdavernas
Copy link
Member

cdavernas commented Jun 10, 2024

only keep concurrently (or replace with branch to make it a verb for consistency)

That's a good idea. However, even though I like branch, I'm not sure it conveys the idea of running concurrently 😞.
multitask, or something like that?

@matthias-pichler
Copy link
Collaborator Author

That's a good idea. However, even though I like branch, I'm not sure it conveys the idea of running concurrently 😞.

multitask, or something like that?

Hmm multitask certainly isn't consistent or a verb 😅 I'd prefer keeping concurrently over it.

The idea for branch also came from AWS StepFunctions (they run in parallel there)

@cdavernas
Copy link
Member

cdavernas commented Jun 10, 2024

multitask
/ˈmʌltɪtɑːsk/
verb

  1. (of a person) deal with more than one task at the same time.
    "I managed my time efficiently and multitasked"

It is a verb, though certainly not the best term. But there is only a little amount of alternatives, such as the (horrible) parallelize

The idea for branch also came from AWS StepFunctions (they run in parallel there)

Interesting. Might be worth digging.

@JBBianchi
Copy link
Member

Do you mean:

do:
  execute:
    branch:
      - task1:
      - task2:
      - task3:    

or

do:
  branch:
    - task1:
    - task2:
    - task3: 

(an alternative to branch could also be fork)

In the first case, we keep the "controversial" execute but have an extra level to put config like compete: true. In the second one, we get rid of the controversial keyword but we might need an extra level of depth.

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 10, 2024

@JBBianchi
Im trying to undestand the issue with swith. I understood that rather than writing

do:
  execute:
    sequentially:
      - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
          execute:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
          execute:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
          execute:
            sequentially:
              - logWarning: {}
              - notifyAdmin: {}

we should be able to write

do:
     - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
          execute:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
          execute:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
          execute:
            sequentially:
              - logWarning: {}
              - notifyAdmin: {}

isnt it?
But it we remove execute, as option 2 in this discussion, it would be

do:
  sequentially:
     - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
           sequentially:
              - logWarning: {}
              - notifyAdmin: {}

Am I missing anything? (probably I am, my apologies in advance)

@JBBianchi
Copy link
Member

@JBBianchi Im trying to undestand the issue with swith. I understood that rather than writing

do:
  execute:
    sequentially:
      - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
          execute:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
          execute:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
          execute:
            sequentially:
              - logWarning: {}
              - notifyAdmin: {}

we should be able to write

do:
     - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
          execute:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
          execute:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
          execute:
            sequentially:
              - logWarning: {}
              - notifyAdmin: {}

isnt it?

That's my concern indeed.

But it we remove execute, as option 2 in this discussion, it would be

do:
  sequentially:
     - processOrder:
          switch:
            - case1:
                when: .orderType == "electronic"
                then: processElectronicOrder
            - case2:
                when: .orderType == "physical"
                then: processPhysicalOrder
            - default:
                then: handleUnknownOrderType
      - processElectronicOrder:
          execute:
            sequentially:
              - validatePayment: {}
              - fulfillOrder: {}
          then: exit
      - processPhysicalOrder:
          execute:
            sequentially:
              - checkInventory: {}
              - packItems: {}
              - scheduleShipping: {}
          then: exit
      - handleUnknownOrderType:
          execute:
            sequentially:
              - logWarning: {}
              - notifyAdmin: {}

Am I missing anything? (probably I am, my apologies in advance)

The later example doesn't seem to match any late suggestions made in this issue. I didn't mention removing execute per se as it's another topic. But if I understand what @matthias-pichler-warrify and @cdavernas were talking about just after I raised my concern, is that sequentially is the default behavior of a do, therefore execute.sequentially doesn't really have a reason to exists. Which leaves us with concurrently, which could be replaced by branch(, multitask, parallelize or fork). What's unclear is if we keep execute.branch (or alternative) or we just get rid of execute altogether an opt for branch (or alternative) straight on - which then, rejoin the point 2 of #889 and makes the discussion obsolete. In the later case though, we'd probably need an extra level of depth for some params like racing (or threads count for instance).

e.g.:

do:
  - processOrder:
      switch:
        - case1:
            when: .orderType == "electronic"
            then: processElectronicOrder
        - case2:
            when: .orderType == "physical"
            then: processPhysicalOrder
        - default:
            then: handleUnknownOrderType
  - processElectronicOrder:
      do:
        - validatePayment: {}
        - fulfillOrder: {}
        then: exit
  - processPhysicalOrder:
      execute:
        compete: true
        branch:  # `multitask`, `parallelize` or `fork`
          - checkInventory: {}
          - packItems: {}
          - scheduleShipping: {}
            then: exit
  - handleUnknownOrderType: #...

or

do:
  - processOrder:
      switch:
        - case1:
            when: .orderType == "electronic"
            then: processElectronicOrder
        - case2:
            when: .orderType == "physical"
            then: processPhysicalOrder
        - default:
            then: handleUnknownOrderType
  - processElectronicOrder:
      do:
        - validatePayment: {}
        - fulfillOrder: {}
        then: exit
  - processPhysicalOrder:
      branch:
        compete: true
        on:  # `multitask`, `parallelize` or `fork`
          - checkInventory: {}
          - packItems: {}
          - scheduleShipping: {}
            then: exit
  - handleUnknownOrderType: #...

@cdavernas @matthias-pichler-warrify am I getting it right ?

@ricardozanini
Copy link
Member

ricardozanini commented Jun 10, 2024

Guys, what about just focus on solving the problem that was oversight and making the tasks always an array? Then we open another issue to tackle the execute/concurrently debate.

It's already hard to follow up on the discussion on this issue. What about a PR to solve the do: in a single task, we close this one and open another?

@fjtirado
Copy link
Collaborator

fjtirado commented Jun 11, 2024

@JBBianchi There was a typo in my last example, I edited it. My point was that this is kind of related with the solution for do: execute:, if we simplify how multitask is written, its not really an issue that switch is a task which only makes sense within a multitask (which is why we are proposing to make do: multitask again)
Also, doing do: always a multitask (so if you want a single task you just write one named task, as far as this also applies to loops, retries and catch) and removing sequentially (because its implicit in do:), leaving concurrently as an special case of multitask will also work.

@cdavernas
Copy link
Member

cdavernas commented Jun 11, 2024

its not really an issue that switch is a task which only makes sense within a multitask

I strongly disagree: it is a problem, and a big one.
The fact that do could have different meaning is blocking for you in terms of consistency, but not the fact that the switch task - and the switch task only - cannot be used at top level?
Furthermore, as said above, a single task per workflow goes against the definition of a workflow. It also is a problem when it comes to visualization.

(which is why we are proposing to make do: multitask again)

That is not what we are proposing. We are proposing:

  1. To make do a (sequential) array everywhere
  2. To get rid of execute/sequentially
  3. Create a new task to "parallelize" tasks

@fjtirado
Copy link
Collaborator

@cdavernas
I think we are are saying the same thing, so there is not need to strongly disagree ;)
When I said that we are proposing to do a multitask I mean do: being an array (which I said is fine as far as do: have the same syntax everywhere, not only in main flow)
And removing execute/sequentially is part of the do:execute: redundancy discussion.

@cdavernas
Copy link
Member

I think we are are saying the same thing, so there is not need to strongly disagree ;)

Hehe, sorry, seems I misunderstood you indeed 🤣

And removing execute/sequentially is part of the do:execute: redundancy discussion.

Indeed.

@JBBianchi
Copy link
Member

Please close this issue and continue the discussion in #894

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: spec Changes in the Specification
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants