Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of ResourceAlreadyExistsException for security index #46214

Open
bizybot opened this issue Sep 2, 2019 · 3 comments
Open

Better handling of ResourceAlreadyExistsException for security index #46214

bizybot opened this issue Sep 2, 2019 · 3 comments
Labels
>bug :Security/Security Security issues without another label Team:Security Meta label for security team

Comments

@bizybot
Copy link
Contributor

bizybot commented Sep 2, 2019

When the security index is not created and operation is being performed on the security index we prepare the index and then execute the operation. But in case we receive two parallel requests then there are two requests for creating the security index. One of them fails with ResourceAlreadyExistsException as the index has been created and then it immediately executes the operation assuming security index is available. This may not be true as the index has been created but not yet available (all primary shards not yet active, in case of security index it usually is 1), the request fails with different error depending on the operation being performed.

Discuss if this needs to be fixed and what the resolution looks like.
Options:

  • do nothing and let it fail the operation, the error message is not intuitive and does not let user know why it failed.
  • error out when ResourceAlreadyExistsException is thrown instead of continuing with the operation with a proper error message.
  • check for the security index availability before invoking the operation by an exponential backoff and after no of retries fail with a proper error message.
  • any other alternatives.
@bizybot bizybot added >bug :Security/Security Security issues without another label team-discuss labels Sep 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@tvernum
Copy link
Contributor

tvernum commented Feb 6, 2020

We discussed this some time ago, but didn't update the issue.

Part of the problem here is that we don't have any handling in the SecurityIndexManager for Index exists, but is still initialising.
So while it's more likely that this issue will come up following a ResourceAlreadyExistsException, it could technically happen in other code paths too.

There's a general problem that the SecurtyIndexManager has no strategy for We think there is probably a task running somewhere in the cluster that will get the security index into a ready state, but we don't know if/when it will complete.

We could build an internal queue of some sort of tasks to run when the security index is available, where each one has an expiry time (probably something like 5-10s or so, but that's pulled from nowhere) and then we'd execute those tasks when we get a cluster update that moves the security index into "available" state, and have a period timer to call the failure handler with some timeout exception if they reach their expiry.

That seems complex though, which is why we haven't done it. To date we've only seen reports of this from internal testing that access clusters in abnormal ways. We haven't had a report from a production cluster (which isn't to say it doesn't happen, just that we don't hear about it).

@rjernst rjernst added the Team:Security Meta label for security team label May 4, 2020
@benwtrent
Copy link
Member

This may be related to this issue: #65846

If there were a way to say "wait for primary shard or timeout", it could address this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Security Security issues without another label Team:Security Meta label for security team
Projects
None yet
Development

No branches or pull requests

5 participants