[FEATURE] Flatten result index mapping for visualizing nested objects in Dashboards #1306

jackiehanyang · 2024-09-10T21:32:19Z

Flatten Result Index

Problem Statement

In Anomaly Detection, many values are not flattened, making it difficult to view them on the dashboard. For instance, entity values are nested objects, and features are arrays. The requirement is to reference a feature by name and apply conditions like f1 > 3. Additionally, there is a need to perform terms aggregation on categorical fields. This will require adjustments to the mapping and the addition of new fields in the result index.

What to Flat

Original result index mapping when a detector has anomalies:

{
    "detector_id": "fylE53wBc9MCt6q12tKp",
    "schema_version": 0,
    "data_start_time": 1635927900000,
    "data_end_time": 1635927960000,
    "feature_data": [
        {
            "feature_id": "processing_bytes_max",
            "feature_name": "processing bytes max",
            "data": 2291
        },
        {
            "feature_id": "processing_bytes_avg",
            "feature_name": "processing bytes avg",
            "data": 1677.3333333333333
        },
        {
            "feature_id": "processing_bytes_min",
            "feature_name": "processing bytes min",
            "data": 1054
        },
        {
            "feature_id": "processing_bytes_sum",
            "feature_name": "processing bytes sum",
            "data": 5032
        },
        {
            "feature_id": "processing_time_max",
            "feature_name": "processing time max",
            "data": 11422
        }
    ],
    "anomaly_score": 1.1986675882872033,
    "anomaly_grade": 0.26806225550178464,
    "confidence": 0.9607519742565531,
    "entity": [
        {
            "name": "process_name",
            "value": "process_3"
        }
    ],
    "approx_anomaly_start_time": 1635927900000,
    "relevant_attribution": [
        {
            "feature_id": "processing_bytes_max",
            "data": 0.03628638020431366
        },
        {
            "feature_id": "processing_bytes_avg",
            "data": 0.03384479053991436
        },
        {
            "feature_id": "processing_bytes_min",
            "data": 0.058812549572819096
        },
        {
            "feature_id": "processing_bytes_sum",
            "data": 0.10154576265526988
        },
        {
            "feature_id": "processing_time_max",
            "data": 0.7695105170276828
        }
    ],
    "expected_values": [
        {
            "likelihood": 1,
            "value_list": [
                {
                    "feature_id": "processing_bytes_max",
                    "data": 2291
                },
                {
                    "feature_id": "processing_bytes_avg",
                    "data": 1677.3333333333333
                },
                {
                    "feature_id": "processing_bytes_min",
                    "data": 1054
                },
                {
                    "feature_id": "processing_bytes_sum",
                    "data": 6062
                },
                {
                    "feature_id": "processing_time_max",
                    "data": 23379
                }
            ]
        }
    ],
    "threshold": 1.0993584705913992,
    "execution_end_time": 1635898427895,
    "execution_start_time": 1635898427803,
    "past_values": [
        {
            "feature_id": "processing_bytes_max",
            "data": 905
        },
        {
            "feature_id": "processing_bytes_avg",
            "data": 479
        },
        {
            "feature_id": "processing_bytes_min",
            "data": 128
        },
        {
            "feature_id": "processing_bytes_sum",
            "data": 1437
        },
        {
            "feature_id": "processing_time_max",
            "data": 8440
        }
    ]
}

After flattening:

{
    ......SAME ORIGINAL CONTENT AS ABOVE......
 
    // flattened feature_data fields
    "feature_data_processing_bytes_max": 2322,
    "feature_data_processing_bytes_avg": 1718.6666666666667,
    "feature_data_processing_bytes_min": 1375,
    "feature_data_processing_bytes_sum": 5156,
    "feature_data_processing_time_max": 31198，
    
    // flattened entity fields
    "entity_process_name_value": "process_3",
    
    // flattened relevant_attribution fields
    "relevant_attribution_processing_bytes_max": 0.03628638020431366,
    "relevant_attribution_processing_bytes_avg": 0.03384479053991436,
    "relevant_attribution_processing_bytes_min": 0.058812549572819096,
    "relevant_attribution_processing_bytes_sum": 0.10154576265526988,
    "relevant_attribution_processing_time_max": 0.7695105170276828,
    
    // flattened expected_values fields
    "expected_values_processing_bytes_max": 2291,
    "expected_values_processing_bytes_avg": 1677.3333333333333,
    "expected_values_processing_bytes_min": 1054,
    "expected_values_processing_bytes_sum": 6062,
    "expected_values_processing_time_max": 23379
    
    // flattened past_values fields
    "past_values_processing_bytes_max": 905,
    "past_values_processing_bytes_avg": 479,
    "past_values_processing_bytes_min": 128,
    "past_values_processing_bytes_sum": 1437,
    "past_values_processing_time_max": 8440
}

Difficulties:

The following outlines the difficulties encountered during this project, with each point logically flowing as a consequence of the previous one.

When OpenSearch Visualization loads index data, it relies on the static mapping of the index rather than the actual content structure. Consequently, to enable accurate visualizations, we nee to create a separate result index as a flattened copy of the original result index. This flattened index ensures that the data structure aligns with our visualization requirements.
When using dynamic index mapping could alleviate concerns about mapping structure, it is unsuitable for cases where the flattening process is dynamically influence by detector configurations. Therefore, if dynamic mapping is not an option, we must carefully put together a static index mapping that accommodates these dynamic flattening requirements.
OpenSearch currently lacks support for aggregation on nested fields, which presents additional challenges. Although nested fields can appear in dotpath format on the IndexPattern and Discover pages, they are unavailable for aggregation on the Visualization page. Even if dotpath formats were supported in Visualization, this approach does not fulfill our need to flatten result indices for AD and enable customer-friendly aggregations. To achieve this, we need to extract specific field values as keys during the flattening process. This limitation necessitates leveraging painless script to dynamically flatten and reconstruct the data.
However, painless scripts in OpenSearch do not support making client calls within the script. This restriction means it is impossible to directly ingest transformed data from one index into another within the script itself. As a result, painless scripts can only handle flattening nested fields, and we must handle the task of hydrating the separate result index outside the script.
To hydrate the separate result index with flattened data, we could use the reindex API to copy flattened results from the original result index. However, the reindex API operates as a one-time action, meaning it cannot accommodate cases where the data flattening needs to occur on a recurring or scheduled basis.
To perform reindexing on a schedule, we could utilize an ISM policy that includes a reindex action, associating it with the original result index. This approach would enable scheduled reindexing to keep the separate result index up to date. However, this introduces a dependency on ISM, which we want to avoid in order to maintain flexibility and reduce reliance on additional OpenSearch components.

Solutions:

| separate index needed? | is dynmiac mapping enabled for this separate index? | ingest pipeline needed? | index processor needed? | script processor needed? | when to hydrate the separate index | complexity/LOE -- | -- | -- | -- | -- | -- | -- | -- Approach 1 | Y | Y | Y | Y | Y | the index processor will take care of it | medium Approach 2 | Y | Y | Y | N | Y | when writing to the existing result index, directly write results into this separate index | small Approach 3 | Y | N | N | N | N | when writing to the existing result index, dynamically write results into this separate index according to its mapping. | large Approach 4 | N | N | N | N | N | N/A | extra large to unknown | | | | | | | | | | | | | |

Approach 1. Setup a separate index and an ingest pipeline. Use an index processor to hydrate the separate index, and a script processor to flatten its nested fields.
Open Search currently doesn’t currently support an index processor in its ingest pipeline.

Approach 2 (proposing). Setup a separate index and hydrate it alongside the existing result index. Use an ingest pipeline with a script processor to flatten the nested fields in the separate index.
Set up a separate index alongside the custom result index, using the same mapping as the result index but with dynamic mapping enabled. After creating the index, configure an ingest pipeline with a script processor that uses a painless script to flatten all five nested list fields into the desired flattened format. Whenever results are written to the existing result index, also write to this separate index, ensuring consistency between the two. The ingest pipeline and script processor are triggered during writes to handle the flattening of the nested fields seamlessly.
Pros:

require the smallest effort among all approaches

Cons:

an additional index will be created for customers
an ingest pipeline will be created for customers

Approach 3. Setup a separate index and programmatically generate its index mapping. Hydrate the separate index alongside the existing result index.

Set up a separate index alongside the custom result index without defining a static mapping. Instead, programmatically generate the mapping by iterating through the config file (detector settings) to extract information from nested fields, such as the Feature list. During the hydration process, cross-compare the data with the config file to ensure results are appropriately written into this separate index.

Pros:

no additional resources like index or pipeline will be created for customers

Cons:

requires large amount of effort to make this change happen.
adding numerous if-else branches throughout the codebase to ensure we programmatically handle this optional feature correctly.

Approach 4. No action needed from AD side. The flattening process all happens in visualization side.
Pros:

the best practice solution for customers
brings border impact for open search as a whole

Cons:

requires extra large amount of effort, and involves many unknowns

dblock · 2024-09-30T16:15:29Z

[Catch All Triage - 1, 2, 3, 4]

jackiehanyang · 2024-10-28T20:39:32Z

After setting up the ingest pipeline to flatten the nested fields, I can see the new flattened fields on the index pattern page. However, on the visualization side, the Field dropdown list is not loading the newly added flattened fields. I have created an issue on the OSD side regarding this matter - opensearch-project/OpenSearch-Dashboards#8722

jackiehanyang added enhancement untriaged labels Sep 10, 2024

jackiehanyang self-assigned this Sep 10, 2024

kaituo mentioned this issue Sep 17, 2024

[META] AD enhancment in 2.17, 2.18 #1278

Open

5 tasks

dblock removed the untriaged label Sep 30, 2024

minalsha added the v2.18.0 label Oct 14, 2024

minalsha added this to OpenSearch Roadmap Oct 14, 2024

github-project-automation bot moved this to New in OpenSearch Roadmap Oct 14, 2024

minalsha moved this from New to In Progress in OpenSearch Roadmap Oct 14, 2024

minalsha added v2.9.1 and removed v2.18.0 labels Oct 23, 2024

minalsha added v2.19.0 and removed v2.9.1 labels Nov 4, 2024

jackiehanyang mentioned this issue Jan 29, 2025

[DOC] Flatten Custom Result Index in Anomaly Detection opensearch-project/documentation-website#9132

Closed

4 tasks

jackiehanyang closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Flatten result index mapping for visualizing nested objects in Dashboards #1306

[FEATURE] Flatten result index mapping for visualizing nested objects in Dashboards #1306

jackiehanyang commented Sep 10, 2024 •

edited

Loading

dblock commented Sep 30, 2024

jackiehanyang commented Oct 28, 2024

[FEATURE] Flatten result index mapping for visualizing nested objects in Dashboards #1306

[FEATURE] Flatten result index mapping for visualizing nested objects in Dashboards #1306

Comments

jackiehanyang commented Sep 10, 2024 • edited Loading

Flatten Result Index

Problem Statement

What to Flat

Difficulties:

Solutions:

dblock commented Sep 30, 2024

jackiehanyang commented Oct 28, 2024

jackiehanyang commented Sep 10, 2024 •

edited

Loading