-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support workflow versioning #526
Comments
I appreciate the effort you put into outlining this proposal, and while in principle I support the idea of versioning, I do not think the Fundamentally, the It looks like this "list" idea aligns with your open questions. I'm not sure performance is going to be that big of a hit. Building an object and applying a finite number of diffs from it is likely not that hard.... we may want to always keep the "latest" version but keep diffs to earlier versions to "roll back".
Even if we maintain a list of templates (or a base template + list of diffs), as we've found with maintaining a state index with resources, it's not exactly trivial to edit/append/rollback a single document. (Deprovisioning just wipes the whole thing and rewrites it.) I do think there's one possibility here in the fact that we've designed for a "list of workflows". Currently we only have one such workflow named "provision". We could keep multiple different "launch" workflows in that section while preserving the metadata (except that also includes a version so we'd have to figure out what to do with that). But changing that will make it more complex for backend-only users. From a front-end / UX perspective, I don't see the need for a user to see the actual document ID. Can't we just use the name + version to do all that you suggest, while keeping the document ID (workflow ID) hidden from the user (mapped to the "Launch ID")? We have complete/full documents with different IDs, copied from each other, independently CRUD-friendly without overwriting or using painless script shenanigans. Searching for all workflows matching a given name is easy. |
So after thinking about this far too much for a weekend, I'm coming to see the benefit of keeping a list of versioned templates assigned to a single workflow ID. The main advantage here is the 1-to-1 correlation with provisioned resources. It'll be a complex issue trying to jump back and forth between versions and trying to figure out which resources to deprovision/provision, but that seems solvable and makes more sense with a single workflow id. |
@dbwiddis thanks for the input. Definitely valid concerns about fitting multiple workflow versions under the workflow ID which is uniquely derived from the doc ID. But the way I view this change is a workflow is now only unique by a combination of the base workflow ID (initially set when the workflow is created) + version ID (set by the user or given some default). Here's some detailed examples of my thinking: User creates an initial workflow:
|
User creates new version under same workflow ID:
|
My reasoning for proposing maintaining separate docs for each version is to try to keep the document-level granularity paradigm the same; each document contains a unique workflow (via workflow ID + version ID), a standalone template, and a standalone set of resources. This keeps alignment with the existing backend design, and fully aligns with the frontend. Also agreed this adds some complexity to the provision/deprovision APIs (and all APIs frankly) to route requests to the correct version. I would hope we can have a single piece of shared logic for (1) determining if there are multiple workflow versions, (2) fetching the latest, default version, and (3) fetching a user-specified version. These can be run top-level for each API.
|
I am still a little confused about this @dbwiddis, will we just tell users the only way to use fine grained provisioning, if the new workflow is under the same workflow ID but using a different version? |
This concept overall sounds good to me, have a few more questions however. it seems like we are changing the create workflow API to now take a workflowID:
For example: If we create the original workflow and we also assign version as v1. Lets say the document ID assigned is
*We still haven't defined if the default for the GET api without the version is the original or the last version (I assume we can decide that later) Then lets say I create a new version of the workflow do I then do:
then based on your above comment I get this:
What would these return me then?
Essentially would "version 2" of abc1234 return me the def456?
|
@ohltyler I think I'm interpreting your previous comment as using an "id" field as a "tag" which auto-populates with the id of the first workflow (in that "group") document id. That seems reasonable as a grouping field transparent to the user, but I'm still a bit concerned about how these multiple documents interact with the created resources. See below.
@amitgalitz Maybe? I don't know. I am not sure we've adequately addressed this question. The issue is where the resources "live" if they have been effectively provisioned by multiple workflows (or launches or whatever they are called.). Consider this concrete example:
Questions for this example: Do the new created resources "live" under the new/current workflow/document ID or the old one, or both? Does the old id even have any record of created resources or do we just remove them all and transfer them to the new one?
It's this second bullet that's central to my "list" comment above. I have to search over all workflows to find a matching group/id tag. Compare that to keeping all workflows in the same "group" in a list in a single document. All of the various iterations are right there and we can keep track for created resources with a single read-in of the document and no complex searching and comparing. |
@amitgalitz thanks for your thoughts here-
Correct, this would just be for creating a new version
A maximum of two - (1) calling get workflow API without specifying a version (which would fall back to a default version), and (2) calling get workflow API and specifying the version. Note creating a new version should not create a new workflow ID - in my example, the
It would return the unique workflow "version 2" under workflow
Agreed, I have details on that in the original proposal. We can have defaults for everything.
Yeah fair point. I think I agree with this as well, in that it is all under the same resource path, but just filtering by different versions via path parameter. |
A new document. Editing and re-launching wouldn't have any special logic for sharing resources across launches/versions, I feel that could add a lot of unnecessary complications. My proposal is a new launch/version is doing nothing different than how workflows currently work - see previous reply below:
You couldn't. If a user deprovisions a launch they can't use it for prototyping. We could have some frontend logic for filtering out versions to only make available provisioned launches to test with.
You can't. Similar to how we aren't tracking created resources after we create them - it isn't feasible for our plugin to try to inject logic in say, delete model API, to route through our plugin to verify if all workflows are still valid. I guess we could add some enhancement logic to check whether some resource ID exists in another document, but that's nothing specific to this versioning proposal.
If you see a path forward where all workflow versions can be persisted in one document / one workflow ID, I would have no objections to that. This points back to my Open Question 2:
The frontend requirement is just to have multiple versions/launches fall under a parent workflow ID, with CRUD (including provision/deprovision) functionality for each individual version/launch, and persisted resources/template/state for each version/launch. I think at the API-level, we can still do something like I've proposed above. But if there is optimizations regarding how they are indexed and persisted in one/multiple documents, I have no strong opinion on this piece. |
Thanks @ohltyler I think I understand better now. If a user creates a workflow abc1234 and provisions it. We know have two documents indexed in template index and state index with the docId being abc1234. This makes it easy to get the state back to the user. We essentially do a Now if a user creates a new version under abc1234, v2? would there response back be this?:
now a user never knows of when user does any GET operations on state or template index they would either provide
|
Adding a versionId to the APIs would require us to change all the APIs. Instead, I believe the grouping can be treated as an additional feature rather than changing our current setup. I understand that this feature is requested from the frontend to group workflows of the same type together for easy listing. To address this request on the backend, we can use a Since most of the discussion on this issue revolves around the For each workflow template, we can introduce a new field called
When creating a new workflow, instead of linking both workflows together, we will assign the same group:
This would look like the below in the data structure point of view:
This approach maintains a 1:1 mapping between the workflowID and existing resources, requiring no changes to the existing APIs. To list the workflows together, we can introduce a new GET API:
This API would list the workflows together, and we can also retrieve details about the resources created for each workflow through the status API. |
@amitgalitz right, with a multiple-index approach, this would be the changed logic. Instead of searching on doc id, would need to search on underlying workflow ID. In other words, only relying on the doc ID to generate the ID for a new workflow, and after that, using that same ID to find all versions of that workflow. To @dbwiddis point, maybe there is optimizations to still persist multiple versions within an existing doc, and just append new templates / new resource lists / new states to such doc for each new version. But regardless, at an interface level, should remain the same. |
@owaiskazi19 my concern with introducing a workflow group concept is it creates a large gap between backend APIs and the frontend. The frontend requirement is to have a single ID with multiple workflows / workflow versions / etc. under a single workflow ID. If we went with this approach, we would have to map frontend workflow -> backend workflow group, which would cause user confusion and messy frontend logic. Even with frontend tricks to "hide" the workflow group concept & have defaults, it is not a clean full-stack design. Understood this changes the APIs, but my proposal is that this would be quite minimal changes. You would just need one or two shared pieces of logic across them. See proposal:
Fundamentally the existing APIs remain the same, and users who do not care about them don't need to make any changes. |
Which is why we would group multiple workflows under the same umbrella. Why would it cause confusion to the user? This issue mostly addresses the communication between frontend and the backend of flow framework. A frontend workflowId would be mapped to backend workflowId. Only when frontend wants to talk to multiple workflows at the same time, workflow group would come into the picture.
From
Introducing a versionID with all the APIs and then appending the docId of the 1st workflow in the document of the new workflow looks more confusing if the user just uses the backend plugin. We need to address both types of users: |
This would be required by default - a workflow ID could not map to a backend ID in this scenario. All of the workflow editing / set of launches / set of individual resources / prototyping falls under a single workflow with a single ID. The workflow list page, the workflow details pages (including the ID in the url) would all now be hidden under a workflow group ID instead. This introduces two differences between the frontend and backend: Frontend "workflow" -> backend "workflow group" This is where the confusion happens when there is a large disconnect between what the frontend is portraying, and what all of our documentation / APIs / backend is doing. For example, a user sees there is 3 "workflows" from the workflow list page. However, because these are underlying backend workflow "groups", maybe each group contains 10 workflows. So a user may think they have 3 workflows, and then query using our search workflow API, and see there are 30.
How so? From an interface level, there is no difference to the user. This is just a low-level implementation detail/example on how we may persist multiple versions of a single workflow. Users would not be querying against raw doc IDs - they only care about workflow IDs and version IDs and using our APIs. And, users who don't care about versioning, don't need to do anything different than what they do today. Again, splitting into multiple documents is just one approach, perhaps appending existing documents within a single workflow ID as @dbwiddis has mentioned is more streamlined. |
Won't we list all the workflows present in all the workflow groups here? Still trying to understand the gap which is being called out. For example: There will be 10 workflows in a workflow group. Similarly, we can have 2 more workflow groups with same number of workflows. On the workflow list page, we can list all the workflows together by combing the workflows of the groups. We can have the below API
Yes, from the interface level it would look the same. I am talking about in terms of design level. I feel like keeping same type of workflows in the group looks cleaner and this way every workflow would be different and can act as one single entity without any dependency. |
No, because the frontend design is a single workflow which may contain x number individual launches/versions within it, and there is no concept of workflow groups. The workflow list, for example, would list out all workflows. Again, even if we do frontend tricks to abstract / hide things (mapping frontend "workflow" to backend "workflow group"), this misalignment leads to the confusion and mismatch between frontend and rest of documentation/apis/backend.
Somewhat agreed and this is a fair point, although imo it also adds complexity and more terminology users need to learn. Additionally, there is more friction when users are trying to perform rapid prototyping, a common use case (if they decide to rapid prototype and want to organize their workflows, they need to now retroactively create a workflow group, add their existing workflow group to it, create a new workflow, join it to the workflow group, and copy over the template from the first workflow). By just supporting versioning, there is less user-facing changes and doesn't introduce any new APIs besides one to create a new version with an existing workflow. It would require less APIs needed to call for handling rapid prototyping. There's definitely pros and cons to each. But, unfortunately the workflow grouping idea this has gotten a lot of pushback from product and UX side and isn't desirable from customer-side, hence we need to find a middleground to provide a clean experience full-stack for users. And to reiterate, we can explore more on how to persist them within the system indices. It doesn't have to be separate documents for a separate version. What I'm trying to finalize on is (at-the-least) the interface level in how to communicate with the backend plugin to handle frontend requirements. |
Which is fine. It's totally upto frontend to decide how it want to have different terminology for different APIs/fields. Other external frontend using our backend plugin could call workflow as flow or code or anything else.
Need to understand more of the gaps and pushbacks here. The reason I am pushing on grouping is to keep all the workflows separate and reduce dependency. With versioning we might have to track versions of the template later for fine-gained-provisioning and it will introduce some complications on the backend side. |
Very minor name changes may be acceptable, but UX will not allow this much of terminology changes. This particular name change adds a lot of user-facing confusion, in that a frontend "workflow" would correspond to a backend "workflow group". We can't have a single term "workflow" mean something entirely different between frontend and backend. We need to have consistency on this piece. Pointing back to the example of the frontend workflow list page would mean something entirely different than what's returned from our search workflow APIs.
The pushback is this is the proposed UX that has gone through months of review with customers and product. We need to support this on the backend, which is a single workflow and x versions/launches related to it.
Fine-grained provisioning and any other workflow-related features should not be impacted, could you help explain? This proposed change is keeping document-level granularity the same for each unique workflow. The only difference is a unique workflow is now determined by the workflow ID AND version, but all other functionality remains the same. If users want to provision/deprovision/fine-grained provisioning, the APIs would remain the same. They either only specify workflow ID (defaulting to some version), or specify workflow ID + version ID. Backend just needs initial routing logic at the API entry points, but other than that, there would be no complications I foresee. |
From further discussion with @owaiskazi19 and @dbwiddis, we can take the following approach, a combination of proposals from above. We can introduce a new concept of workflow groups on the backend, used as an internal way for persisting associated workflows differing only by versions, without exposing/changing the proposed APIs/interfaces described above. User-facing (both frontend and APIs), everything will remain as workflow IDs. Internally, these will be stored as workflow group IDs.
List out some user examples below: Scenario 1: user creates a new workflow
|
Scenario 2: user creates a new version of existing workflow
|
Alternatives discussedThe above approach could lead to more race conditions and edge cases to handle if a user kicks off multiple API calls referencing the same workflow ID and the backend has to managed concurrent index updates. Rather than introducing a new workflow group system index, alternatively we could omit that altogether, since its primary use here is for backend management and cleaner designs for handling common filtering logic (fetching latest version, fetching all versions, etc.) Alternatively, this could be handled all at the search request level when querying the workflows system index. We just change the searching of unique workflows (workflow ID + version ID) to be a search request specifying these specific field values, or sort by latest Also, rather than "workflow group ID" we could call it something like "shared workflow ID" or something else. The main idea is, regardless of either of these approaches, we are adding a layer of abstraction on workflow such that a workflow ID does not match 1:1 to a doc ID. Instead, it represents an internal backend concept of some shared ID where multiple individual workflows (differentiated by version ID) have a shared ID. But, the interfaces and functionality remains unchanged. These both closely resemble the original proposal; the main difference is really naming and the potential addition of a new system index for managing multiple workflow versions. My original proposal was just using the first generated doc ID as the "workflow group ID" - here, the idea is to have a separately generated ID entirely. |
I think this minor change actually helps us all align here.
Naming things is hard! I'm happy with whatever name we use and I think we've addressed the issue of shared resources so I'm GTG with this concept. |
I wouldn't say I am completely aligned here but looking at the discussion on this thread, this looks like a better approach for majority of us. Keeping the docId of the group index as workflowId rather the docId of the first workflow created looks cleaner. |
But the first generated doc ID is the original workflow, isn't a group. A group by idea wouldn't be provisioned. Also I don't know if I fully get the benefit of having another system index, instead of just keeping the same indices with separate documents for everything, the less we have to update a document that is already existing I think the better. |
We can do that, and/or allow the user to specify it. Definitely several different options for where/how to get this ID.
Yeah, pros/cons to each. Personally no strong opinion on this part. Listed out more alternatives as well, it can be boiled down to just query DSL at the REST layer to achieve the same thing. Helper fns to fetch all versions or the latest version can be achieved from a query or navigating a system index. |
@dbwiddis @owaiskazi19 do you both prefer having an additional system index plus having to do updates on a single document multiple times over changing internal search to something like:
|
Why do we need an additional index for just updates? Can you provide a use case from any other plugin we have currently? |
Does having a system index give us a "source of truth" that makes things easier than searching? Or does it introduce more complexity of updates? Search should be just as fast, probably? Adding an index smells of premature optimization. I'd lean toward implementing right now with search. We can always "make it more efficient later" if needed (which it may not be). |
Background
Proposed @opensearch-project/opensearch-ux mockups for the frontend plugin introduces a concept of launches within a single workflow. This allows for easy prototyping for a particular use-case, allowing users to construct and test different workflow configurations, persisting the history and associated resources with each deployment for reference.
The current backend design does not support this paradigm. One approach to support this is introducing versioning to workflows, opening up functionality to iterate on a workflow at the API level to persist and provision multiple versions of a single workflow. Originally discussed in #109, this issue expands on this idea.
Implementation
Currently there is a
version
field (see here) but it is not being used or validated yet. We can refactor this into being a required field with aString
value that users can set to whatever they want. This way, there will always be a defaultversion
for a workflow. (If we need to persistcompatibility
, we can refactor that into a standalone template field)Internally, workflows with the same ID but different versions can behave identically to the current workflow implementation; users can perform CRUD operations on them, they each persist their own provisioning state, created/associated resources, etc.
To handle defaults at the API level, backend logic can filter through the workflow ID, and if there are multiple instances of that workflow ID under different versions, choose the one with the latest
last_update_time
or equivalent such field.API changes
Users need the capability to run CRUD operations on individual versions of a workflow if they want. Users who don't want/care about versioning shouldn't have to, with abstracted out logic and sufficient defaults. Most changes listed below are just adding an additional path to existing APIs that include a version ID.
Create workflow
Users specify a (now mandatory)
version
field in the template.[NEW] Create new version of a workflow
Same as create, but now include the already-created
workflow-id
in the path. The JSON body should include a non-name-clashingversion
with any existing versions for that workflow, else throw an error. See open questions below on what to include in this JSON body.Update workflow
Same as before, with a new optional
version-id
to specify which version of the workflow to update. If no version specified, default to the latest versionGet workflow
Same as before, with a new optional
version-id
to specify which version of the workflow to get. If no version specified, default to the latest version.Provision workflow
Same as before, with a new optional
version-id
to specify which version of the workflow to provision. If no version specified, default to the latest versionGet workflow state
Same as before, with a new optional
version-id
to specify which version of the workflow to get state. If no version specified, default to the latest versionDeprovision workflow
Same as before, with a new optional
version-id
to specify which version of the workflow to deprovision. If no version specified, default to the latest versionDelete workflow
Same as before, with a new optional
version-id
to specify which version of the workflow to delete. If no version specified, default to all versionsNo changes/updates needed for the Get Workflow Steps or Search Workflow APIs.
Open questions
name
/description
/use_case
overridden by some later version. Fundamentally these fields should be static. Internally maybe we propagate such static fields to the newly-versioned workflow when indexing.(ASIDE) Frontend implications
This approach seamlessly fits into the frontend design for handling multiple launches/deployments/versions. Matching up UX action vs. API calls:
Updates
3/5: updated example API paths to have
version-id
be query param instead of in the path itselfThe text was updated successfully, but these errors were encountered: