-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] Making all copies of shards spread evenly across all Awareness Attribute #3367
Comments
Requesting community and also pinging @shwetathareja @dblock @reta @nknize to provide feedback on above . |
Thanks @gbbafna for opening this issue
|
Thanks @gbbafna , I clearly see the benefits but there are some concerns as well. Primarily, the basic premise I have seen in many deployments is that clusters are quite dynamic by nature, with that:
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/delayed-allocation.html |
Thanks @Bukhtawar , @reta for your feedbacks.
Makes sense. We can break this into two parts and start with
Agreed. For search use cases, replica count can be a multiple of zone attribute values , so as to scale for reads. We can have
In case of 2, replica count will not even be an accepted parameter at the time of index creation.
Agreed .
Yes.
Taking full control of replica count management makes a lot of sense for ease of management . We can have Summarizing above points, I propose breaking the feature in two parts :
|
Thanks, @gbbafna just curious how |
The validation setting The second part of the feature |
A few questions:
What else is this used for?
Is this on indexing, rebalancing, some other operation or multiple?
How is the AZ count specified?
Are these live reloaded settings or require cluster restart? Any new REST APIs? |
This is used to distribute the shards across the AZs/racks. If there are 2 copies of a shard and 2 zones, 1 zone will have 1 copy of shard.
This is on operations which create/modify index .
Assuming Below settings illustrate same for two awareness attributes.
These are dynamic cluster level settings . Existing APIs to update settings will be reused to modify these |
@gbbafna is this tracking 2.2? Can we add two labels to this? 1/ "roadmap" to highlight this improvement on the project roadmap 2/ the version of OpenSearch this is targeting? Thanks! |
Yes, we are tracking 2.2 for this. @Bukhtawar , can you please help with same as I don't have permissions ? |
Thanks! I see it now. Can we also open an issue in the docs repo to track any documentation updates that might need to happen for this? |
Is your feature request related to a problem? Please describe.
In cloud HA deployments , customer usually deploy over multiple zones. zone is usually the
awareness.attributes
in there . However, there is no enforcement of all copies spread evenly across all zones . This can cause uneven distribution of shards and also create shard hotspots. Failure in a single zone might also cause data loss and unavailability for that shard if the copies aren't evenly spread out.Describe the solution you'd like
There are two solutions to this approach :
routing.allocation.awareness.balance
which is false by default . When true, we would validate that total copies is always a maximum of awareness attribute value count . If not, we will throw a validation exception. If there are multiple awareness attributes, the balance needs to ensure that every variant of awareness_attribute is equally balance. For ex, if there are 2 Awareness Attributes, zones and rack ids, each having 2 possible values , total copies needs to be multiple of 2.auto_balance_across_awareness_attribute
. If this is true, we would increase the total copies to be a multiple of AZ count . For instance, there are 3 AZs and index creation request comes with 7 replica. OpenSearch will create 8 replica, to ensure that there are total 9 copies .Both the solutions will take in effect only upon
cluster.routing.allocation.awareness.attributes
andcluster.routing.allocation.awareness.force.zone.values
being set . If not, the setting will not take in effect .Trade offs
First approach : The plugins like ISM, CCR needs to do proactive validation while creation and updation of policy. If not, the actions/replication will fail silently at later point of time. As and when new policies or index creation paths are created , we will need to keep adding the validation there for a good experience.
Second approach : Since the replica count is adjusted by OpenSearch, the plugin and new index creation/modification paths don't need any handling and is very low maintenance. However, the fact that we are deviating from API supplied parameter may not look like a good user experience.
User Experience
cluster.routing.allocation.awareness.attributes
andcluster.routing.allocation.awareness.force.zone.values
routing.allocation.awareness.balance
, the total copy needs to be a maximum of all possible values of awareness attribute. If not , we will do one of the followingWhy it should be built
This is to ensure that OpenSearch cluster remains well balanced as well as resilient to failures of zone/Rack etc.
What will it take to execute?
Changes in OpenSearch as well Plugins to honor the new flag .
The text was updated successfully, but these errors were encountered: