-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WLM] Synchronizing Rules Across Nodes #16889
Comments
@ruai0511 Thanks for proposing this! Can we add low level details on how this new framework will integrate into OpenSearch ? |
@msfroh @reta @jainankitk @backslasht @andrross |
Thanks @ruai0511 ,I believe |
Thanks @ruai0511 for this proposal. I don't see any concerns at all with the rule synchronization being eventually consistent. Hence, I don't necessarily view the On primary node (assumes single primary shard):
|
@ruai0511 @kaushalmahi12 Regarding approach 2 which seems to be the recommended one, seems complex and involves building a custom replication protocol, which feels overly complicated for this use case at the moment. I am more inclined towards a combination of approach 1 and 4. I believe having redundant data is a better approach than building our own replication protocol. Do we have an estimate of the expected storage size, considering the size per rule and the maximum number of rules? I don't imagine it exceeding a few megabytes. Also regarding approach 4 con:
I suppose this issue also applies to approach 2, correct? If one or more nodes fail to acknowledge, they would end up with stale rules. |
Please describe the end goal of this project
Recently, we launched the WLM subfeature, i.e., multi-tenant search resiliency, to allow managing multi-tenant environments in OpenSearch. However, this feature still relies on external hints to be sent along with each request via an HTTP header.
This becomes cumbersome in programmatic access, and without proper planning, it can lead to unmanageable multi-tenant access. A more efficient solution would allow users to define rules that determine the appropriate tenant for certain types of requests (eg. requests from a specific user or targeting some index). In order for this to work efficiently, these rules (both index and in-memoey storage) must be synchronized consistently and up-to-date across all nodes in the cluster. The approach must be both consistent and efficient.
Assumptions
Synchronization Approaches
Conclusion
After evaluating the synchronization approaches, we recommend adopting the Push to Sync method. This approach guarantees that as soon as a rule is updated, all nodes are immediately notified and update their local Trie, maintaining consistency across the cluster without delay. Besides, this approach also generates minimal system load since the rule update event happens rarely.
While Refresh-Based Synchronization may be simpler to implement and gives user freedom to control the refresh interval, it introduces a risk of synchronization delays and unnecessary overhead of periodically refreshing rules (since the rule updates will be minimal). These delays could result in different nodes having outdated rules temporarily, and could lead to inconsistent results given by different nodes for similar queries. Since consistency is important in our use case, this approach may not be ideal.
In summary, Push to Sync is the optimal synchronization mechanism to meet our needs for consistency, low latency, and minimal system impact.
Supporting References
#16813
#16797
Issues
#16797
Related component
Search
The text was updated successfully, but these errors were encountered: