Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Enforcing Replica Count Control in ISM Policy Creation Path #381

Closed
gbbafna opened this issue Jun 15, 2022 · 5 comments
Closed

[RFC] Enforcing Replica Count Control in ISM Policy Creation Path #381

gbbafna opened this issue Jun 15, 2022 · 5 comments
Labels

Comments

@gbbafna
Copy link
Contributor

gbbafna commented Jun 15, 2022

Is your feature request related to a problem?

This is related to opensearch-project/OpenSearch#3461

The OpenSearch changes will make sure create index/modify index settings path adhere to proper replica count. If not, an exception will be thrown to the API Call.

Aim : Enforce users are creating ISM Policies which adhere to proper replica count as per their allocation attributes

What solution would you like?

Validation in Policy Creation path to make sure that replica count is set appropriately.

What alternatives have you considered?

An alternate is not to do anything at the time of ISM policy creation . This will lead to failure of execution of ReplicaCountAction at a later point of time. This will be a silent failure and user will need to check ISM logs to figure out what is wrong with the policy and then modify it appropriately.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@downsrob
Copy link
Contributor

As allocation attributes can change with time, ISM policies failing due to this setting seems unavoidable. When this task is picked up we need to make sure that we are able to uniquely identify these failures so they may be properly displayed to the user.
If I was managing my cluster and I was planning on adding a rack or area zone, I might want to modify or create new policies with the correct replica counts before rolling out the awareness attribute change. Validation at the time of policy creation/update would block users from this preemptive update. It still might be the best option, but we should consider this.

@bowenlan-amzn
Copy link
Member

Validation in policy creation path won't help with existing policies that sets replica count.


An alternate is not to do anything at the time of ISM policy creation . This will lead to failure of execution of ReplicaCountAction at a later point of time. This will be a silent failure and user will need to check ISM logs to figure out what is wrong with the policy and then modify it appropriately.

Suppose set replica count API call fails when being executed, ISM should be able to catch the ValidationException here and surface it to customer through the explain API.


We are planning on a validation mechanism to let user check if the policy is going to fail. This replica enforcement could be one of the validations we want to onboard. So user doesn't have to wait until the action actually executed but can check preemptively.

@gbbafna
Copy link
Contributor Author

gbbafna commented Jun 20, 2022

Thanks @downsrob and @bowenlan-amzn for the review.

If I was managing my cluster and I was planning on adding a rack or area zone, I might want to modify or create new policies with the correct replica counts before rolling out the awareness attribute change. Validation at the time of policy creation/update would block users from this preemptive update. It still might be the best option, but we should consider this.

Agreed on that. To counter that, we can add a force flag to skip this validations .

Validation in policy creation path won't help with existing policies that sets replica count.

Yes, that is the caveat which will be called out . Even the OpenSearch change of replica enforcement applies only to newer indices and newer API calls .

We are planning on a validation mechanism to let user check if the policy is going to fail. This replica enforcement could be one of the validations we want to onboard. So user doesn't have to wait until the action actually executed but can check preemptively.

This sounds great. force flag will make sense here then. Is this actively being worked upon ? Replica Enforcement in OpenSearch is already in progress and is expected to be in for minor release . Do you think we can add it in now and move it to the validations framework later or vice versa ?

@bowenlan-amzn
Copy link
Member

Yes, that is the caveat which will be called out . Even the OpenSearch change of replica enforcement applies only to newer indices and newer API calls .

Existing policy that sets replica count could be used to manage newer indices.
Not sure what's the newer API calls. In ISM, we are doing the replica change by updateSettings call.

https://github.com/opensearch-project/index-management/blob/main/src/main/kotlin/org/opensearch/indexmanagement/indexstatemanagement/step/replicacount/AttemptReplicaCountStep.kt#L35-L36

For now, I suppose it could break after we made the enforcement change in above scenario: existing policy which sets replica to a number which will cause exception because of replica enforcement, and is used to manage newer indices. This is one thing we want to call out.

We are planning on a validation mechanism to let user check if the policy is going to fail. This replica enforcement could be one of the validations we want to onboard. So user doesn't have to wait until the action actually executed but can check preemptively.

This sounds great. force flag will make sense here then. Is this actively being worked upon ? Replica Enforcement in OpenSearch is already in progress and is expected to be in for minor release . Do you think we can add it in now and move it to the validations framework later or vice versa ?

This has just started recently by our intern. We don't have a release plan for this feature yet. As the project goes on, we can update the plan in our roadmap and link it here. I don't think it could catch the Replica Enforcement change in OpenSearch for the next minor release.


For the validation during policy creation time, we will discuss it with PMs and try to implement along with the Replica Enforcement change.

@gbbafna
Copy link
Contributor Author

gbbafna commented Jul 25, 2022

Hi @bowenlan-amzn ,

For the validation during policy creation time, we will discuss it with PMs and try to implement along with the Replica Enforcement change.

I can take this up and have a minimal policy validation framework to execute it. That will unblock the current use case which is targeting 2.2 release.

When the validation framework comes, this validation could then merge with it . Does this work ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants