You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Anomaly Detection, many values are not flattened, making it difficult to view them on the dashboard. For instance, entity values are nested objects, and features are arrays. The requirement is to reference a feature by name and apply conditions like f1 > 3. Additionally, there is a need to perform terms aggregation on categorical fields. This will require adjustments to the mapping and the addition of new fields in the result index.
What to Flat
Original result index mapping when a detector has anomalies:
The following outlines the difficulties encountered during this project, with each point logically flowing as a consequence of the previous one.
When OpenSearch Visualization loads index data, it relies on the static mapping of the index rather than the actual content structure. Consequently, to enable accurate visualizations, we nee to create a separate result index as a flattened copy of the original result index. This flattened index ensures that the data structure aligns with our visualization requirements.
When using dynamic index mapping could alleviate concerns about mapping structure, it is unsuitable for cases where the flattening process is dynamically influence by detector configurations. Therefore, if dynamic mapping is not an option, we must carefully put together a static index mapping that accommodates these dynamic flattening requirements.
OpenSearch currently lacks support for aggregation on nested fields, which presents additional challenges. Although nested fields can appear in dotpath format on the IndexPattern and Discover pages, they are unavailable for aggregation on the Visualization page. Even if dotpath formats were supported in Visualization, this approach does not fulfill our need to flatten result indices for AD and enable customer-friendly aggregations. To achieve this, we need to extract specific field values as keys during the flattening process. This limitation necessitates leveraging painless script to dynamically flatten and reconstruct the data.
However, painless scripts in OpenSearch do not support making client calls within the script. This restriction means it is impossible to directly ingest transformed data from one index into another within the script itself. As a result, painless scripts can only handle flattening nested fields, and we must handle the task of hydrating the separate result index outside the script.
To hydrate the separate result index with flattened data, we could use the reindex API to copy flattened results from the original result index. However, the reindex API operates as a one-time action, meaning it cannot accommodate cases where the data flattening needs to occur on a recurring or scheduled basis.
To perform reindexing on a schedule, we could utilize an ISM policy that includes a reindex action, associating it with the original result index. This approach would enable scheduled reindexing to keep the separate result index up to date. However, this introduces a dependency on ISM, which we want to avoid in order to maintain flexibility and reduce reliance on additional OpenSearch components.
Solutions:
| separate index needed? | is dynmiac mapping enabled for this separate index? | ingest pipeline needed? | index processor needed? | script processor needed? | when to hydrate the separate index | complexity/LOE
-- | -- | -- | -- | -- | -- | -- | --
Approach 1 | Y | Y | Y | Y | Y | the index processor will take care of it | medium
Approach 2 | Y | Y | Y | N | Y | when writing to the existing result index, directly write results into this separate index | small
Approach 3 | Y | N | N | N | N | when writing to the existing result index, dynamically write results into this separate index according to its mapping. | large
Approach 4 | N | N | N | N | N | N/A | extra large to unknown
| | | | | | |
| | | | | | |
Approach 1. Setup a separate index and an ingest pipeline. Use an index processor to hydrate the separate index, and a script processor to flatten its nested fields. Open Search currently doesn’t currently support an index processor in its ingest pipeline.
Approach 2 (proposing). Setup a separate index and hydrate it alongside the existing result index. Use an ingest pipeline with a script processor to flatten the nested fields in the separate index. Set up a separate index alongside the custom result index, using the same mapping as the result index but with dynamic mapping enabled. After creating the index, configure an ingest pipeline with a script processor that uses a painless script to flatten all five nested list fields into the desired flattened format. Whenever results are written to the existing result index, also write to this separate index, ensuring consistency between the two. The ingest pipeline and script processor are triggered during writes to handle the flattening of the nested fields seamlessly. Pros:
require the smallest effort among all approaches
Cons:
an additional index will be created for customers
an ingest pipeline will be created for customers
Approach 3. Setup a separate index and programmatically generate its index mapping. Hydrate the separate index alongside the existing result index.
Set up a separate index alongside the custom result index without defining a static mapping. Instead, programmatically generate the mapping by iterating through the config file (detector settings) to extract information from nested fields, such as the Feature list. During the hydration process, cross-compare the data with the config file to ensure results are appropriately written into this separate index.
Pros:
no additional resources like index or pipeline will be created for customers
Cons:
requires large amount of effort to make this change happen.
adding numerous if-else branches throughout the codebase to ensure we programmatically handle this optional feature correctly.
Approach 4. No action needed from AD side. The flattening process all happens in visualization side. Pros:
the best practice solution for customers
brings border impact for open search as a whole
Cons:
requires extra large amount of effort, and involves many unknowns
The text was updated successfully, but these errors were encountered:
After setting up the ingest pipeline to flatten the nested fields, I can see the new flattened fields on the index pattern page. However, on the visualization side, the Field dropdown list is not loading the newly added flattened fields. I have created an issue on the OSD side regarding this matter - opensearch-project/OpenSearch-Dashboards#8722
Flatten Result Index
Problem Statement
In Anomaly Detection, many values are not flattened, making it difficult to view them on the dashboard. For instance, entity values are nested objects, and features are arrays. The requirement is to reference a feature by name and apply conditions like f1 > 3. Additionally, there is a need to perform terms aggregation on categorical fields. This will require adjustments to the mapping and the addition of new fields in the result index.What to Flat
Original result index mapping when a detector has anomalies:After flattening:
Difficulties:
The following outlines the difficulties encountered during this project, with each point logically flowing as a consequence of the previous one.Solutions:
| separate index needed? | is dynmiac mapping enabled for this separate index? | ingest pipeline needed? | index processor needed? | script processor needed? | when to hydrate the separate index | complexity/LOE -- | -- | -- | -- | -- | -- | -- | -- Approach 1 | Y | Y | Y | Y | Y | the index processor will take care of it | medium Approach 2 | Y | Y | Y | N | Y | when writing to the existing result index, directly write results into this separate index | small Approach 3 | Y | N | N | N | N | when writing to the existing result index, dynamically write results into this separate index according to its mapping. | large Approach 4 | N | N | N | N | N | N/A | extra large to unknown | | | | | | | | | | | | | |Approach 1. Setup a separate index and an ingest pipeline. Use an index processor to hydrate the separate index, and a script processor to flatten its nested fields.
Open Search currently doesn’t currently support an index processor in its ingest pipeline.
Approach 2 (proposing). Setup a separate index and hydrate it alongside the existing result index. Use an ingest pipeline with a script processor to flatten the nested fields in the separate index.
Set up a separate index alongside the custom result index, using the same mapping as the result index but with dynamic mapping enabled. After creating the index, configure an ingest pipeline with a script processor that uses a painless script to flatten all five nested list fields into the desired flattened format. Whenever results are written to the existing result index, also write to this separate index, ensuring consistency between the two. The ingest pipeline and script processor are triggered during writes to handle the flattening of the nested fields seamlessly.
Pros:
- require the smallest effort among all approaches
Cons:Approach 3. Setup a separate index and programmatically generate its index mapping. Hydrate the separate index alongside the existing result index.
Set up a separate index alongside the custom result index without defining a static mapping. Instead, programmatically generate the mapping by iterating through the config file (detector settings) to extract information from nested fields, such as the Feature list. During the hydration process, cross-compare the data with the config file to ensure results are appropriately written into this separate index.
Pros:
- no additional resources like index or pipeline will be created for customers
Cons:Approach 4. No action needed from AD side. The flattening process all happens in visualization side.
Pros:
- the best practice solution for customers
- brings border impact for open search as a whole
Cons:The text was updated successfully, but these errors were encountered: