Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]Observability Workflow API #1805

Open
YANG-DB opened this issue Feb 12, 2024 · 2 comments
Open

[RFC]Observability Workflow API #1805

YANG-DB opened this issue Feb 12, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Feb 12, 2024

Is your feature request related to a problem?
This RFC proposes the introduction of a new feature within the OpenSearch Observability plugin: a workflow-based API named "Workflow".

This API is designed to orchestrate multiple steps, each corresponding to a separate API call to a core API within OpenSearch. The primary goal of the Workflow API is to facilitate complex data preparation processes involving enrichment, aggregation, reindexing, and other transformations.

By enabling users to define multi-step workflows, this feature aims to simplify the preparation of raw data for visualization, thereby providing deeper insights into system status and health.

Proposal
Workflow Definition
A workflow is defined as a sequence of steps, each of which corresponds to an API call within OpenSearch. Users can specify the details of each step, including the API endpoint to be called, the HTTP method to use, and the body of the request. Workflows are defined in JSON format, allowing for easy creation, modification, and sharing.

Workflow Structure
A typical workflow definition includes the following components:

  • Name: A unique identifier for the workflow.
  • Description: A brief explanation of the workflow's purpose and its operations.
  • Version: The version of the workflow definition.
  • Parameters: A list of parameters that can be used throughout the workflow. These parameters can be dynamically replaced at runtime.
  • Steps: An ordered list of steps that the workflow will execute. Each step includes:
    • Name: A unique identifier for the step.
    • Method: The HTTP method (e.g., GET, POST, PUT) to be used for the API call.
    • Endpoint: The API endpoint to call. This can include placeholders for parameters defined in the workflow.
    • Body: The request body for the API call, if applicable.

Example

The workflow metadata parameters:

{
  "name": "Reds-Services-Workflow",
  "description": "This workflow creates a RED transformation metrics top measure services performance",
  "version": "1.0.0",
  "parameters": [
    "index_name",
    "rollup_name",
    "current_time",
    "start_time"
  ],

The workflow steps parameters:

   ....
  "steps": [
    {
      "name": "add_timestamp_field_to_otel_spans_index",
      "method": "PUT",
      "endpoint": "/${index_name}",
      "body": { ... }
     },
     ....
    ]

Workflow Backend Support

The workflow json request will be stored as a versioned separate document, this document will be enriched with the http response returned by the step's endpoint.

Each workflow request will be assigned with an ID which can be later queries for status.

A status API will be available for query and will give the workflow state according to each step status status:

.../workflow/${id}/status - will return the step's status according to the current state.

Example

{
  "name": "Reds-Services-Workflow",
  "description": "This workflow creates a RED transformation metrics top measure services performance",
  "version": "1.0.0",
  "@timestamp": "2024-01-28T02:31:51.379677283Z"
  "steps": [
    {
      "name": "add_timestamp_field_to_otel_spans_index",
      "method": "PUT",
      "endpoint": "/${index_name}",
      "startTime": "2024-01-28T02:31:51.379677283Z",
      "endTime": "2024-01-28T02:31:51.379677283Z",
      "status": "complete"
    },
    {
      "name": "reindex_old_otel_spans_index_to_new",
      "method": "POST",
      "endpoint": "_reindex",
      "startTime": "2024-01-28T02:31:51.379677283Z",
      "endTime": "2024-01-28T02:31:51.379677283Z",
      "status": "complete"
    },
    {
      "name": "rollup_REDS_services_metrics_index",
      "method": "PUT",
      "endpoint": "_plugins/_rollup/jobs/${rollup_name}",
      "startTime": "2024-01-28T02:31:51.379677283Z",
      "status": "running",
      "task": {
        "nodes": {
          "Mgqdm0r9SEGClWxp_RbnaQ": {
            "name": "opensearch-node1",
            "transport_address": "172.18.0.3:9300",
            "host": "172.18.0.3",
            "ip": "172.18.0.3:9300",
            "roles": [
              "data",
              "ingest",
              "master",
              "remote_cluster_client"
            ],
            "tasks": {
              "Mgqdm0r9SEGClWxp_RbnaQ:17416": {...}
            },
            "Mgqdm0r9SEGClWxp_RbnaQ:17413": {...},
            "Mgqdm0r9SEGClWxp_RbnaQ:17366": {...}
          }
        }
      }
    }
  ]
}


Example Workflow: RED Metrics for Service Performance

Below is an example workflow definition request designed to create RED (Rate, Errors, Duration) transformation metrics to measure service performance.

The next workflow (described here) is responsible of preparing a transformation aggregative index that contains the RED (requests, errors, duration) summaries for the Observability services as they defined by the OTEL schema.

{
  "name": "Reds-Services-Workflow",
  "description": "This workflow creates a RED transformation metrics top measure services performance",
  "version": "1.0.0",
  "parameters": [
    "index_name",
    "rollup_name",
    "current_time",
    "start_time"
  ],
  "steps": [
    {
      "name": "add_timestamp_field_to_otel_spans_index",
      "method": "PUT",
      "endpoint": "/${index_name}",
      "body": {
        "mappings": { ... }
      }
    },
    {
      "name": "reindex_old_otel_spans_index_to_new",
      "method": "POST",
      "endpoint": "_reindex",
      "body": {
        "source": {
          "index": "otel-v1-apm-span-*"
        },
        "dest": {
          "index": "${index_name}"
        },
        "script": {
          "source": "ctx._source['@timestamp'] = ctx._source.startTime;"
        }
      }
    },
    {
      "name": "rollup_REDS_services_metrics_index",
      "method": "PUT",
      "endpoint": "_plugins/_rollup/jobs/${rollup_name}",
      "body": {

        "rollup": {
          "rollup_id": "rollup_REDS_services_metrics_index",
          "enabled": true,
          "schedule": {
            "interval": {
              "start_time": 1706146764865,
              "period": 1,
              "unit": "Minutes",
              "schedule_delay": 0
            }
          },
          "description": "",
          "schema_version": 19,
          "source_index": "otel-v1-apm-span-000072-fixed",
          "target_index": "rollup_services_metrics_index",
          "metadata_id": "58NGPo0B1DvKaqdigtmw",
          "page_size": 1000,
          "delay": 0,
          "continuous": true,
          "dimensions": [
            {
              "date_histogram": {
                "fixed_interval": "1h",
                "source_field": "@timestamp",
                "target_field": "@timestamp",
                "timezone": "UTC",
                "format": null
              }
            },
            {
              "terms": {
                "source_field": "serviceName",
                "target_field": "serviceName"
              }
            },
            {
              "terms": {
                "source_field": "@timestamp",
                "target_field": "@timestamp"
              }
            },
            {
              "histogram": {
                "source_field": "status.code",
                "target_field": "status.code",
                "interval": 1
              }
            },
            {
              "terms": {
                "source_field": "span.attributes.http@status_code",
                "target_field": "span.attributes.http@status_code"
              }
            }
          ],
          "metrics": [
            {
              "source_field": "durationInNanos",
              "metrics": [
                {
                  "max": {}
                },
                {
                  "avg": {}
                }
              ]
            }
          ]
        }
      }
    }
  ]
}

Do you have any additional context?

@dbwiddis
Copy link
Member

This looks like a fantastic use case for https://github.com/opensearch-project/flow-framework

@minalsha
Copy link

@dbwiddis is working on this. See opensearch-project/flow-framework#522

@Swiddis Swiddis removed the untriaged label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants