-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add Enrich index background task to cleanup old indices #43746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-core-features |
| import org.elasticsearch.threadpool.ThreadPool; | ||
| import org.elasticsearch.xpack.core.enrich.EnrichPolicy; | ||
|
|
||
| public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you mentioned via another channel, looks like we can just remove the unused enrich indices instead of the mark then delete strategy that this class is doing now. The EnrichPolicyLocks class now enforces that either the EnrichPolicyMaintenanceService or EnrichPolicyExecutor exclusive permit the modify enrich indices for a particular enrich policy and I think that is good enough. With that, this class can be simplified now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes this a bit easier to reason about. I pushed 09b724e
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I left a few comments.
| } | ||
|
|
||
| private void execute() { | ||
| logger.debug("triggering scheduled [enrich] maintenance task"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup period defaults to 15 minutes, so the likelihood that two cleanups run at the time is small.
But maybe we can check if cancellable is set and if it is then not execute cleanUpEnrichIndices () method?
We would also need to unset cancellable after old indices have been removed or it was determined that no indices need to be deleted.
| import org.elasticsearch.threadpool.ThreadPool; | ||
| import org.elasticsearch.xpack.core.enrich.EnrichPolicy; | ||
|
|
||
| public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a single node test that verifies that unused enrich indices do get removed?
| private void scheduleNext() { | ||
| try { | ||
| TimeValue waitTime = EnrichPlugin.ENRICH_CLEANUP_PERIOD.get(settings); | ||
| cancellable = threadPool.schedule(this::execute, waitTime, ThreadPool.Names.GENERIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have a field that indices that this node is still active master and check that here before scheduling. Because I think now if offMaster() is invoked, there is still a chance that scheduleNext() gets invoked that sets a new value to the cancellable field?
jakelandis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a couple nits and read-ability.
|
|
||
| @Override | ||
| public void onFailure(Exception e) { | ||
| logger.error("Could not delete enrich indices that were marked for deletion", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: marked for deletion is no longer relevant
|
|
||
| import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException; | ||
|
|
||
| class EnrichPolicyLocks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some javadoc to this class ? Or maybe just an adjustment of the naming. I am a bit confused that
coordinationLock protects LockState
is lockState() locking some state, or is the state of a lock, or is state protected by a lock ? (I think word "lock" is over used here)
what does safe mean ? (it seems to be an encapsulation leak via naming ?)
EDIT: I understand what is doing now, but I think naming or Java doc would help alot with readability.
| void releasePolicy(String policyName) { | ||
| coordinationLock.readLock().lock(); | ||
| try { | ||
| policyLocks.remove(policyName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this release the semaphore instead of removing the policy from the map ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the semaphore is only acquired in a non-waiting fashion and only really meant to ensure a first come first serve model, it's just discarded when the policy is completed to keep the concurrent map tidy.
This also protects against any instruction interleaving funny business that might occur:
- Thread 1: gets semaphore from map, preempted by thread scheduler
- Thread 2: releases semaphore, removes semaphore from map, preempted by thread scheduler
- Thread 1: successfully acquires dead semaphore, starts execution
- Background cleanup thread: starts cleanup, sees no new operations occurring on locks, deletes Thread 1's work in progress index
Even if we were to remove from the map first before releasing the semaphore in step 2, it leads to the same scenario. We could just recheck the map after acquiring the semaphore, but that seems less efficient/elegant than just discarding the used semaphore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question implied that once in the map the entry would never be removed (which I don't think would result in the issue you described). As-is works fine, I think it just a matter of preference. So +1 as-is or leaving it in the map and releasing it in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, we could leave the semaphore in the map and remove it if an enrich policy is ever deleted. We'll want to ensure that the policy is not being executed currently when performing a policy deletion, so the delete api will most likely have an instance of the locks object anyway. I think the simplicity of leaving it in the map will be worth doing when we make that change
| if (coordinationLock.writeLock().tryLock()) { | ||
| try { | ||
| long revision = policyRunCounter.get(); | ||
| int currentPolicyExecutions = policyLocks.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer mappingCount over size for CHM (however, it doesn't make a different here)
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming build is happy.
|
@elasticmachine run elasticsearch-ci/2 |
1 similar comment
|
@elasticmachine run elasticsearch-ci/2 |
This PR adds a background maintenance task that is scheduled on the master node only. The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.
This PR adds a background maintenance task that is scheduled on the master node only. Cleaning up enrich indices is a two phase process: Marking and Deleting. We mark an index before deleting it to allow some time for any new indices to be rotated in and replace the old one. The marking an index for deletion is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.