[Proposal] Rule Matching #16888

kaushalmahi12 · 2024-12-20T03:05:08Z

Pre-read for this issue: #16797

Please describe the end goal of this project

This document covers the Rule Matching for incoming search requests in depth. After reading this document, reader will have sound understanding of the following

How the request attributes are extracted ?
How exactly does the matching algorithm works on the extracted attributes ?
What are the new constructs and how they work with each other ?

Assumptions

Rule attributes (cardinality) will be limited in nature at feature level and aggregated count.
- When creating Rules the control will first go to the feature to do the
  - Cardinality validation
  - Attributes validation
Each feature will define their own attributes which the feature will be responsible for validating.

New Rule insertion process

Rule schema

{
    
   "attribute1": ["value*"],
   "attribute2": ["value*"],
   "label": "fjagjag9243421_425285",
   "updatedAt": "12-03-2024T18:00:23Z",
   "feature": "WLM"
}

New rule insertion works pretty much how insertion in compressed trie works. But given each attribute can take N(this will be limited e,g; 5 at max) number there is a chance that two or more rules could share one of the values in their respective attribute values list.
In this case whichever rule was created or modified last will override the label for that attribute in the trie.

Attribute extraction process

Given that the Rule attributes are specific to a feature, feature will provide a construct to extract the attributes. For example WLM feature would use threadContext to retrieve the authN info and request itself to retrieve the indices.

Rule Matching Process

In a nutshell the rule matching will be per attribute trie (prefix tree) based matching. Hence ultimately we will be dealing with a list of values per attribute. How this per attribute based list is calculated, we will understand that with the help of some examples

lets say above picture captures a subpart of the in-memory rules Trie. Now following scenarios can occur while calculating possible labels for an attribute value.

Scenario - 1

When the search for the attribute value ends on a node which has a label

search_value: log_q3_2024

In this case the resulting label would be abc.

Scenario - 2

When the search ends on a node which doesn’t have a label then there are following sub scenarios

There is an empty prefix match for this value i,e the search ends at the root node which holds the empty value ( e,g; order_q3_2024)
- In this case we assign nothing and downstream systems can treat this as default behavior
There is a partial match, then it should return top 10 results from the subtree. The order is decided by depth of the node.
- for example for search string logs_q2_2024 , in this case it would return [abc, pqr]

Now the above run down shows how the labels are decided per attribute value but given the request carries multiple values and rule also contains multiple attributes, how to calculate the final value. Now we will select the one which is occurring in all attribute based return label lists .

Lets say for a feature based label evaluation we are considering two attributes and corresponding labels as per the above algorithm are [abc, xyz] and [abc,pqr] now the resulting label here will be abc for this request.

Even with multiple lists following scenarios can occur

All lists have nothing in common ==>> In this case the request will be treated with a default label
All lists have only one value in common ==>> This will be the resultant label for the request
All lists have multiple values in common ==>> In this case to decide the resulting label. we will consider the depth as the tie breaker i,e; label with the min depth will be selected.

Low Level Design

Following UML and interaction diagrams shows the new constructs, their relationships and how they interact with each other.

InMemoryRuleEngine will save the label in ThreadContext which downstream transport actions can consume.

Supporting References

Issues

#16797

Related component

Search

The text was updated successfully, but these errors were encountered:

kaushalmahi12 · 2025-01-03T21:25:04Z

@reta @msfroh @jainankitk
Can you review this and provide your suggestions ?

msfroh · 2025-01-03T21:34:21Z

How the request attributes are extracted ?

I still don't understand how attributes are extracted from a SearchRequest. What are the attributes of a SearchRequest?

msfroh · 2025-01-03T21:36:04Z

This proposal is very abstract. I would like some concrete examples to understand what's going on.

kaushalmahi12 · 2025-01-03T23:50:49Z

@msfroh Thanks for reviewing this!

I still don't understand how attributes are extracted from a SearchRequest. What are the attributes of a SearchRequest?

Since it will be hard to contrive all the features and their feature specific attributes that they will use, I have kept that attribute extraction part abstract. But for WLM we will be using user principle and indices to determine a label, we can use SearchRequest#indices method to get the request indices and ThreadContext#getHeader to get the user principle info.

I would like some concrete examples to understand what's going on.

Can you provide more details about it ? I am not sure whether you got a chance to review #16797 . But if the mentioned issue still doesn't fill the gap, I can answer here.

dblock · 2025-01-06T17:27:48Z

[Catch All Triage - 1, 2, 3, 4, 5, 6]

kaushalmahi12 added Meta Meta issue, not directly linked to a PR untriaged labels Dec 20, 2024

opensearch-infra bot added this to OpenSearch Roadmap Dec 20, 2024

github-project-automation bot moved this to New in OpenSearch Roadmap Dec 20, 2024

kaushalmahi12 self-assigned this Dec 20, 2024

kaushalmahi12 mentioned this issue Dec 20, 2024

[META] Automatic labeling using Rules #16813

Open

dblock removed the untriaged label Jan 6, 2025

kaushalmahi12 mentioned this issue Jan 7, 2025

[WLM Auto tagging] Add compressed trie structure to store Rules #16971

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Rule Matching #16888

[Proposal] Rule Matching #16888

kaushalmahi12 commented Dec 20, 2024 •

edited

Loading

kaushalmahi12 commented Jan 3, 2025

msfroh commented Jan 3, 2025

msfroh commented Jan 3, 2025

kaushalmahi12 commented Jan 3, 2025 •

edited

Loading

dblock commented Jan 6, 2025

[Proposal] Rule Matching #16888

[Proposal] Rule Matching #16888

Comments

kaushalmahi12 commented Dec 20, 2024 • edited Loading

Please describe the end goal of this project

Assumptions

New Rule insertion process

Attribute extraction process

Rule Matching Process

Scenario - 1

Scenario - 2

Low Level Design

Supporting References

Issues

Related component

kaushalmahi12 commented Jan 3, 2025

msfroh commented Jan 3, 2025

msfroh commented Jan 3, 2025

kaushalmahi12 commented Jan 3, 2025 • edited Loading

dblock commented Jan 6, 2025

kaushalmahi12 commented Dec 20, 2024 •

edited

Loading

kaushalmahi12 commented Jan 3, 2025 •

edited

Loading