Add Enrich index background task to cleanup old indices #43746

jbaiera · 2019-06-28T13:06:44Z

This PR adds a background maintenance task that is scheduled on the master node only. Cleaning up enrich indices is a two phase process: Marking and Deleting. We mark an index before deleting it to allow some time for any new indices to be rotated in and replace the old one. The marking an index for deletion is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.

elasticmachine · 2019-06-28T13:06:46Z

Pinging @elastic/es-core-features

martijnvg · 2019-06-30T20:06:23Z

...ugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyMaintenanceService.java

+import org.elasticsearch.threadpool.ThreadPool;
+import org.elasticsearch.xpack.core.enrich.EnrichPolicy;
+
+public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener {


Like you mentioned via another channel, looks like we can just remove the unused enrich indices instead of the mark then delete strategy that this class is doing now. The EnrichPolicyLocks class now enforces that either the EnrichPolicyMaintenanceService or EnrichPolicyExecutor exclusive permit the modify enrich indices for a particular enrich policy and I think that is good enough. With that, this class can be simplified now.

That makes this a bit easier to reason about. I pushed 09b724e

martijnvg

This looks good. I left a few comments.

martijnvg · 2019-07-02T20:05:17Z

...ugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyMaintenanceService.java

+    }
+
+    private void execute() {
+        logger.debug("triggering scheduled [enrich] maintenance task");


The cleanup period defaults to 15 minutes, so the likelihood that two cleanups run at the time is small.

But maybe we can check if cancellable is set and if it is then not execute cleanUpEnrichIndices () method?

We would also need to unset cancellable after old indices have been removed or it was determined that no indices need to be deleted.

martijnvg · 2019-07-02T20:07:48Z

...ugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyMaintenanceService.java

+import org.elasticsearch.threadpool.ThreadPool;
+import org.elasticsearch.xpack.core.enrich.EnrichPolicy;
+
+public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener {


Maybe add a single node test that verifies that unused enrich indices do get removed?

martijnvg · 2019-07-02T20:10:24Z

...ugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyMaintenanceService.java

+    private void scheduleNext() {
+        try {
+            TimeValue waitTime = EnrichPlugin.ENRICH_CLEANUP_PERIOD.get(settings);
+            cancellable = threadPool.schedule(this::execute, waitTime, ThreadPool.Names.GENERIC);


I think we need to have a field that indices that this node is still active master and check that here before scheduling. Because I think now if offMaster() is invoked, there is still a chance that scheduleNext() gets invoked that sets a new value to the cancellable field?

jakelandis

Looking good, just a couple nits and read-ability.

jakelandis · 2019-07-03T00:56:17Z

...ugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyMaintenanceService.java

+
+                @Override
+                public void onFailure(Exception e) {
+                    logger.error("Could not delete enrich indices that were marked for deletion", e);


nit: marked for deletion is no longer relevant

jakelandis · 2019-07-03T01:22:28Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyLocks.java

+
+import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException;
+
+class EnrichPolicyLocks {


Can you add some javadoc to this class ? Or maybe just an adjustment of the naming. I am a bit confused that
coordinationLock protects LockState
is lockState() locking some state, or is the state of a lock, or is state protected by a lock ? (I think word "lock" is over used here)
what does safe mean ? (it seems to be an encapsulation leak via naming ?)

EDIT: I understand what is doing now, but I think naming or Java doc would help alot with readability.

jakelandis · 2019-07-03T01:32:21Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyLocks.java

+    void releasePolicy(String policyName) {
+        coordinationLock.readLock().lock();
+        try {
+            policyLocks.remove(policyName);


should this release the semaphore instead of removing the policy from the map ?

Since the semaphore is only acquired in a non-waiting fashion and only really meant to ensure a first come first serve model, it's just discarded when the policy is completed to keep the concurrent map tidy.

This also protects against any instruction interleaving funny business that might occur:

Thread 1: gets semaphore from map, preempted by thread scheduler

Thread 2: releases semaphore, removes semaphore from map, preempted by thread scheduler

Thread 1: successfully acquires dead semaphore, starts execution

Background cleanup thread: starts cleanup, sees no new operations occurring on locks, deletes Thread 1's work in progress index

Even if we were to remove from the map first before releasing the semaphore in step 2, it leads to the same scenario. We could just recheck the map after acquiring the semaphore, but that seems less efficient/elegant than just discarding the used semaphore.

My question implied that once in the map the entry would never be removed (which I don't think would result in the issue you described). As-is works fine, I think it just a matter of preference. So +1 as-is or leaving it in the map and releasing it in there.

That's a fair point, we could leave the semaphore in the map and remove it if an enrich policy is ever deleted. We'll want to ensure that the policy is not being executed currently when performing a policy deletion, so the delete api will most likely have an instance of the locks object anyway. I think the simplicity of leaving it in the map will be worth doing when we make that change

jakelandis · 2019-07-03T02:28:00Z

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyLocks.java

+        if (coordinationLock.writeLock().tryLock()) {
+            try {
+                long revision = policyRunCounter.get();
+                int currentPolicyExecutions = policyLocks.size();


nit: prefer mappingCount over size for CHM (however, it doesn't make a different here)

martijnvg

LGTM, assuming build is happy.

jbaiera · 2019-07-12T18:43:05Z

@elasticmachine run elasticsearch-ci/2

jbaiera · 2019-07-15T21:07:14Z

@elasticmachine run elasticsearch-ci/2

This PR adds a background maintenance task that is scheduled on the master node only. The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.

Add a background service to clean up old enrich indices

53a65af

jbaiera added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jun 28, 2019

jbaiera requested review from hub-cap, jakelandis and martijnvg June 28, 2019 13:06

jbaiera added >non-issue and removed >enhancement labels Jun 28, 2019

martijnvg reviewed Jun 30, 2019

View reviewed changes

jbaiera added 3 commits July 2, 2019 13:14

Remove mark-then-delete logic for old enrich indices

09b724e

cleanup imports

338b93a

Add a setting for how often the enrich maintenance task runs

98b43f6

jbaiera requested a review from martijnvg July 2, 2019 19:36

martijnvg reviewed Jul 2, 2019

View reviewed changes

jakelandis reviewed Jul 3, 2019

View reviewed changes

jbaiera added 6 commits July 3, 2019 11:35

Fix log message

8879f61

Prefer mappingCount over size on CHM

e8457b0

Clean up some naming around the policy locks. Add documentation

b25c31e

Set a flag to guard against interleaved scheduling of the task.

0b014b6

Ensure only one maintenance task is executing at a time.

e1a6bfb

Adding test to background maintenance service

322443d

martijnvg approved these changes Jul 11, 2019

View reviewed changes

jbaiera requested a review from jakelandis July 12, 2019 15:12

Merge branch 'enrich' into enrich-background-cleanup

a93eb6b

jbaiera merged commit c7ba91b into elastic:enrich Jul 17, 2019

jbaiera deleted the enrich-background-cleanup branch July 17, 2019 16:57

jbaiera added the backport pending label Jul 17, 2019

jbaiera removed the backport pending label Jul 22, 2019

jakelandis mentioned this pull request Jul 30, 2019

[ingest] Enrich documents prior to indexing #32789

Closed

55 tasks

consulthys mentioned this pull request Mar 22, 2022

Enrich index getting deleted while policy is executing #85221

Closed


		import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException;

		class EnrichPolicyLocks {

Add Enrich index background task to cleanup old indices #43746

Add Enrich index background task to cleanup old indices #43746

Uh oh!

Conversation

jbaiera commented Jun 28, 2019

Uh oh!

elasticmachine commented Jun 28, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakelandis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

jbaiera commented Jul 12, 2019

Uh oh!

jbaiera commented Jul 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants