Skip to content

Conversation

@jbaiera
Copy link
Member

@jbaiera jbaiera commented Jun 28, 2019

This PR adds a background maintenance task that is scheduled on the master node only. Cleaning up enrich indices is a two phase process: Marking and Deleting. We mark an index before deleting it to allow some time for any new indices to be rotated in and replace the old one. The marking an index for deletion is based on if it is not linked to a policy or if the enrich alias is not currently pointing at it. Synchronization has been added to make sure that no policy executions are running at the time of cleanup, and if any executions do occur, the marking process delays cleanup until next run.

@jbaiera jbaiera added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jun 28, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

import org.elasticsearch.threadpool.ThreadPool;
import org.elasticsearch.xpack.core.enrich.EnrichPolicy;

public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you mentioned via another channel, looks like we can just remove the unused enrich indices instead of the mark then delete strategy that this class is doing now. The EnrichPolicyLocks class now enforces that either the EnrichPolicyMaintenanceService or EnrichPolicyExecutor exclusive permit the modify enrich indices for a particular enrich policy and I think that is good enough. With that, this class can be simplified now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes this a bit easier to reason about. I pushed 09b724e

@jbaiera jbaiera requested a review from martijnvg July 2, 2019 19:36
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I left a few comments.

}

private void execute() {
logger.debug("triggering scheduled [enrich] maintenance task");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup period defaults to 15 minutes, so the likelihood that two cleanups run at the time is small.

But maybe we can check if cancellable is set and if it is then not execute cleanUpEnrichIndices () method?

We would also need to unset cancellable after old indices have been removed or it was determined that no indices need to be deleted.

import org.elasticsearch.threadpool.ThreadPool;
import org.elasticsearch.xpack.core.enrich.EnrichPolicy;

public class EnrichPolicyMaintenanceService implements LocalNodeMasterListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a single node test that verifies that unused enrich indices do get removed?

private void scheduleNext() {
try {
TimeValue waitTime = EnrichPlugin.ENRICH_CLEANUP_PERIOD.get(settings);
cancellable = threadPool.schedule(this::execute, waitTime, ThreadPool.Names.GENERIC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have a field that indices that this node is still active master and check that here before scheduling. Because I think now if offMaster() is invoked, there is still a chance that scheduleNext() gets invoked that sets a new value to the cancellable field?

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple nits and read-ability.


@Override
public void onFailure(Exception e) {
logger.error("Could not delete enrich indices that were marked for deletion", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: marked for deletion is no longer relevant


import org.elasticsearch.common.util.concurrent.EsRejectedExecutionException;

class EnrichPolicyLocks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some javadoc to this class ? Or maybe just an adjustment of the naming. I am a bit confused that
coordinationLock protects LockState
is lockState() locking some state, or is the state of a lock, or is state protected by a lock ? (I think word "lock" is over used here)
what does safe mean ? (it seems to be an encapsulation leak via naming ?)

EDIT: I understand what is doing now, but I think naming or Java doc would help alot with readability.

void releasePolicy(String policyName) {
coordinationLock.readLock().lock();
try {
policyLocks.remove(policyName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this release the semaphore instead of removing the policy from the map ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the semaphore is only acquired in a non-waiting fashion and only really meant to ensure a first come first serve model, it's just discarded when the policy is completed to keep the concurrent map tidy.

This also protects against any instruction interleaving funny business that might occur:

  • Thread 1: gets semaphore from map, preempted by thread scheduler
  • Thread 2: releases semaphore, removes semaphore from map, preempted by thread scheduler
  • Thread 1: successfully acquires dead semaphore, starts execution
  • Background cleanup thread: starts cleanup, sees no new operations occurring on locks, deletes Thread 1's work in progress index

Even if we were to remove from the map first before releasing the semaphore in step 2, it leads to the same scenario. We could just recheck the map after acquiring the semaphore, but that seems less efficient/elegant than just discarding the used semaphore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question implied that once in the map the entry would never be removed (which I don't think would result in the issue you described). As-is works fine, I think it just a matter of preference. So +1 as-is or leaving it in the map and releasing it in there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, we could leave the semaphore in the map and remove it if an enrich policy is ever deleted. We'll want to ensure that the policy is not being executed currently when performing a policy deletion, so the delete api will most likely have an instance of the locks object anyway. I think the simplicity of leaving it in the map will be worth doing when we make that change

if (coordinationLock.writeLock().tryLock()) {
try {
long revision = policyRunCounter.get();
int currentPolicyExecutions = policyLocks.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer mappingCount over size for CHM (however, it doesn't make a different here)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming build is happy.

@jbaiera jbaiera requested a review from jakelandis July 12, 2019 15:12
@jbaiera
Copy link
Member Author

jbaiera commented Jul 12, 2019

@elasticmachine run elasticsearch-ci/2

1 similar comment
@jbaiera
Copy link
Member Author

jbaiera commented Jul 15, 2019

@elasticmachine run elasticsearch-ci/2

@jbaiera jbaiera merged commit c7ba91b into elastic:enrich Jul 17, 2019
@jbaiera jbaiera deleted the enrich-background-cleanup branch July 17, 2019 16:57
jbaiera added a commit that referenced this pull request Jul 22, 2019
This PR adds a background maintenance task that is scheduled on the master node only.
The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not
currently pointing at it. Synchronization has been added to make sure that no policy
executions are running at the time of cleanup, and if any executions do occur, the marking
process delays cleanup until next run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants