Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add System Labels incl. user and correlationId (an execution reference field) #2059

Closed
tchiotludo opened this issue Sep 7, 2023 · 7 comments
Assignees
Labels
area/backend Needs backend code changes area/frontend Needs frontend code changes enhancement New feature or request kind/customer-request Requested by one or more customers kind/highlight One of the highlights of the upcoming release

Comments

@tchiotludo
Copy link
Member

see discussion here

There is a common notion of a key bound to execution instance. Existing flow processing tools refer to this key as business key or reference key. Such key is set at execution invocation and eventually gets copied to spawned child-processes/sub-flows. The key can be used for effective search based on the data being processed.

Such reference can be set via a combination of input & label, although having a dedicated property of execution might bring some benefits.

You have a data set which is characterized by a single ID - let's say a batch number. After you process that data, you receive a question regarding processing of that particular piece of data.
It's like: "Show me the logs from processing of batch XYZ" or "When did we process the batch XYZ?"
So you need to query the processing history and obtain all related executions.

We could add a simple reference on the execution, each subflow, trigger that start from that flow will copy the reference, and we need to have this field visible on the list and to allow search on that one.

@tchiotludo tchiotludo added the enhancement New feature or request label Sep 7, 2023
@loicmathieu
Copy link
Member

In the integration world (EAI/ESB/SOA) we refere at this by the name of correlation identifier (correlationId) as it allows to correlate multiple messages (here executions) together.

@anna-geller
Copy link
Member

anna-geller commented Sep 8, 2023

Problem

The user's main problem is the heavy reliance on labels. To analyze the state of workflows and task run metrics, some users filter those metrics based on Execution labels, which can be complex.

Use cases

"Our use case is to build parametrized flows orchestrating different processes with steps like input data validation via provided schema, modifying the data, and sending the data to an external system. Such flow is parametrized by inputs to use different validation schemas, perform different payload modifications, and use different connection parameters. We then use labels to differentiate those sets of parameters."

The reference key can be described as "a label identifying particular processed input data". Such key is identified prior feeding the data to Kestra. It is often a "business-relevant ID" contained in the data or a concatenated value from multiple sources.

It could be configured manually by the user in the flow definition.

Possible Solutions

The idea of this reference field is to create a surrogate key comprised of multiple fields for easier filtering of relevant operational data to answer questions such as "What are top 10 failing executions in the past 1h containing label env:prod grouped by label x and flowId => Which PROD flows of a "type" specified by the given label x are most troublesome?"

@anna-geller anna-geller added this to the v0.15.0 milestone Dec 4, 2023
@anna-geller anna-geller added the kind/quick-win Seems to be quick to do label Dec 5, 2023
@anna-geller anna-geller modified the milestones: v0.15.0, v0.16.0 Jan 14, 2024
@anna-geller anna-geller removed the kind/quick-win Seems to be quick to do label Feb 16, 2024
@anna-geller anna-geller changed the title Add an execution reference field Add an execution reference field (correlationId) Mar 22, 2024
@anna-geller anna-geller modified the milestones: v0.16.0, v0.17.0 Apr 2, 2024
@anna-geller anna-geller modified the milestones: v0.17.0, v0.18.0 Apr 15, 2024
@damienkilgannon
Copy link

Since, this is been pushed back in priorities ... anybody have an example work around they use to achieve similar for the time been?

@anna-geller
Copy link
Member

@damienkilgannon you can leverage the label task and filter executions by that label, e.g.:

  - id: label
    type: io.kestra.plugin.core.execution.Labels
    labels:
      url: "{{ outputs.some_task.value }}"

@anna-geller anna-geller removed this from the v0.18.0 milestone Jun 17, 2024
@paulgrainger85 paulgrainger85 added the kind/customer-request Requested by one or more customers label Jul 17, 2024
@anna-geller
Copy link
Member

anna-geller commented Oct 9, 2024

Discussed with Ludo, and we will solve it by introducing System Labels and one of them will be system_correlationId.

System Labels are all labels that have keys starting with system_. Those labels will be by default hidden in the UI and will only be displayed if you explicitly search for them in the labels key-value field.

The system_correlationId will be by default set to the executionId of the first flow in the chain and it will be propagated to all subflow executions downstream of the first flow in the chain. This will be implemented for Subflow and ForEachItem tasks but not for Flow triggers (at least for now, as it's more complex for flow triggers).

If some users would like to set a custom correlation ID, it's possible by defining a custom system_correlationId label in the flow e.g.:

id: flow
namespace: company.team
labels:
  system_correlationId: myCustomKey

tasks:
  - id: hello
    type: io.kestra.plugin.core.log.Log
    message: Hello {{ labels.system_correlationId }}

The System Labels listed below will be automatically added to relevant executions and wil be hidden in the UI:

IMPORTANT NOTE: those should be read-only (not possible to overwrite those).


What about custom labels that users may want to hide in the UI? As an extra scope (if we have time), we can add internal labels set from the configuration file.

kestra:
  internal-labels:
    prefixes:
      - internal_
      - admin_

@anna-geller anna-geller added area/backend Needs backend code changes area/frontend Needs frontend code changes labels Oct 10, 2024
@anna-geller anna-geller added the kind/highlight One of the highlights of the upcoming release label Oct 11, 2024
@anna-geller anna-geller changed the title Add an execution reference field (correlationId) Add System Labels incl. user and correlationId (an execution reference field) Oct 13, 2024
@anna-geller
Copy link
Member

TBD we likely will not allow user's custom system labels

@loicmathieu loicmathieu self-assigned this Oct 22, 2024
This was referenced Oct 22, 2024
@loicmathieu
Copy link
Member

loicmathieu commented Oct 23, 2024

Two labels have been added as system labels: system_username and system_correlationId.
System labels has been made read only and writable only by Kestra itself.
Hidden labels has been implemented for system_ and internal_ prefix.

Only system_appId still needs to be added, this needs to be part of the Apps feature, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend Needs backend code changes area/frontend Needs frontend code changes enhancement New feature or request kind/customer-request Requested by one or more customers kind/highlight One of the highlights of the upcoming release
Projects
Status: Done
Development

No branches or pull requests

5 participants