Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WLM] Synchronizing Rules Across Nodes #16889

Open
ruai0511 opened this issue Dec 20, 2024 · 5 comments
Open

[WLM] Synchronizing Rules Across Nodes #16889

ruai0511 opened this issue Dec 20, 2024 · 5 comments
Assignees
Labels
Meta Meta issue, not directly linked to a PR untriaged

Comments

@ruai0511
Copy link
Contributor

ruai0511 commented Dec 20, 2024

Please describe the end goal of this project

Recently, we launched the WLM subfeature, i.e., multi-tenant search resiliency, to allow managing multi-tenant environments in OpenSearch. However, this feature still relies on external hints to be sent along with each request via an HTTP header.

This becomes cumbersome in programmatic access, and without proper planning, it can lead to unmanageable multi-tenant access. A more efficient solution would allow users to define rules that determine the appropriate tenant for certain types of requests (eg. requests from a specific user or targeting some index). In order for this to work efficiently, these rules (both index and in-memoey storage) must be synchronized consistently and up-to-date across all nodes in the cluster. The approach must be both consistent and efficient.

Assumptions

  1. Rule Changes Are Infrequent: Rule updates are rare.
  2. Rules Storage: The rules for determining the tenant are stored in a system index, and each node maintains an in-memory rule Trie for faster lookups.
  3. Need for Consistency: It's crucial that all nodes have the same set of rules (index and in-memory) at any given time to ensure consistent decision-making when processing requests.
  4. Performance Concerns: The synchronization mechanism must not introduce significant overhead, especially in large clusters.

Synchronization Approaches

  1. Refresh-Based Synchronization
  • Description: Each node periodically refreshes its local Trie data from the central system index. The refresh would involve pulling the latest set of rules from the node that holds the up-to-date system index data and updating the local Trie.
  • Advantages:
    • Simpler implementation.
    • User can control the refresh interval.
  • Disadvantages:
    • Synchronization Delay: As nodes refresh periodically, there will be some delay before a rule change propagates to all nodes. This could potentially lead to inconsistency for short periods.
    • Performance Impact: Frequent refreshes could impose additional load on nodes.
  1. Push to Sync
  • Description: When a rule change occurs, the coordinator node saves the rule in the system index and updates its local Trie, then pushes the updated rule to all other nodes so that they can update their in-memory Tries. This needs to be done via custom mechanisms to notify all nodes of the change.
  • Advantages:
    • Real-time synchronization: As soon as a rule changes, all nodes can be immediately updated.
    • Ensures consistency across nodes with minimal delay.
    • Avoids unnecessary overhead of continuously polling the system index.
  • Disadvantages:
    • Complexity: Requires additional infrastructure to manage rule change notifications, and push mechanisms may introduce additional development overhead.
  1. Pull to Sync
  • Description: Nodes pull the latest rule set from the node with system index when they detect a need to synchronize (e.g., when they receive a request or when a specific trigger is activated).
  • Disadvantages:
    • Increased Load: Queries are constantly coming in and this adds significant pressure to the system.
  1. Full Replication with Local Sync
  • Description: Each node must have a full copy of the system index data, and each node’s Trie is updated from its local index.
  • Advantages:
    • Data is fully localized, avoiding cross-node network requests.
  • Disadvantages:
    • Against the Principles of Distributed Systems: Forcing all data to be stored on every node can lead to redundant data and increase the storage burden on each node.
    • Data Synchronization Delays: Even though data exists locally, if the system index replicas on different nodes are not synchronized, it can cause data inconsistency. The update delay can also increase, especially in larger clusters.
    • Cost and Storage Overhead: Each node needs to store all rule data, which can create significant storage pressure and increase the cost.

Conclusion

After evaluating the synchronization approaches, we recommend adopting the Push to Sync method. This approach guarantees that as soon as a rule is updated, all nodes are immediately notified and update their local Trie, maintaining consistency across the cluster without delay. Besides, this approach also generates minimal system load since the rule update event happens rarely.

While Refresh-Based Synchronization may be simpler to implement and gives user freedom to control the refresh interval, it introduces a risk of synchronization delays and unnecessary overhead of periodically refreshing rules (since the rule updates will be minimal). These delays could result in different nodes having outdated rules temporarily, and could lead to inconsistent results given by different nodes for similar queries. Since consistency is important in our use case, this approach may not be ideal.

In summary, Push to Sync is the optimal synchronization mechanism to meet our needs for consistency, low latency, and minimal system impact.

Supporting References

#16813
#16797

Issues

#16797

Related component

Search

@ruai0511 ruai0511 added Meta Meta issue, not directly linked to a PR untriaged labels Dec 20, 2024
@ruai0511 ruai0511 self-assigned this Dec 20, 2024
@kaushalmahi12
Copy link
Contributor

@ruai0511 Thanks for proposing this! Can we add low level details on how this new framework will integrate into OpenSearch ?

@ruai0511
Copy link
Contributor Author

ruai0511 commented Dec 26, 2024

@msfroh @reta @jainankitk @backslasht @andrross
Could you review this and provide any suggestions/comments? Thanks

@reta
Copy link
Collaborator

reta commented Dec 28, 2024

Could you review this and provide any suggestions/comments? Thanks

Thanks @ruai0511 ,I believe security plugin follows the same approach (Push to Sync) to propagate the config changes. But I am not sure, do we need this complexity here? I believe in all these approaches, certain delays (between rules being updated and finally refreshed on each node) are inevitable, the search requests should never be impacted by rules refresh (please correct me if I am wrong here) so we can minimize the delay only. In this regards, Push to Sync is indeed looks like an optimal solution.

@jainankitk
Copy link
Collaborator

Thanks @ruai0511 for this proposal. I don't see any concerns at all with the rule synchronization being eventually consistent. Hence, I don't necessarily view the Synchronization Delay as disadvantage of the Refresh-Based Synchronization approach. Regarding performance impact, I am wondering if the following approach can work:

On primary node (assumes single primary shard):

  • There is a version number record with the system index for storing the rules
  • The version number is also maintained in memory when the rules index is loaded
  • The rule is written/updated in the system index along with the version number
  • The version number record is incremented (in-memory) and written into the system index. Fail the rule creation/updation request if the version number is not incremented

@sgup432
Copy link
Contributor

sgup432 commented Jan 21, 2025

@ruai0511 @kaushalmahi12
Like the overall ideas proposed.

Regarding approach 2 which seems to be the recommended one, seems complex and involves building a custom replication protocol, which feels overly complicated for this use case at the moment.

I am more inclined towards a combination of approach 1 and 4. I believe having redundant data is a better approach than building our own replication protocol. Do we have an estimate of the expected storage size, considering the size per rule and the maximum number of rules? I don't imagine it exceeding a few megabytes.
And then with approach 1, we can have maybe a low refresh interval(configurable) to update in-memory data from local shard which should work pretty well. Something @jainankitk also mentioned.

Also regarding approach 4 con:

Data Synchronization Delays: Even though data exists locally, if the system index replicas on different nodes are not synchronized, it can cause data inconsistency. The update delay can also increase, especially in larger clusters.

I suppose this issue also applies to approach 2, correct? If one or more nodes fail to acknowledge, they would end up with stale rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Meta issue, not directly linked to a PR untriaged
Projects
Status: New
Development

No branches or pull requests

5 participants