Adds tombstones to cluster state for index deletions #17265

abeyad · 2016-03-23T03:46:12Z

Previously, we would determine index deletes in the cluster state by
comparing the index metadatas between the current cluster state and the
previous cluster state and decipher which ones were missing (the missing
ones are deleted indices). This led to a situation where a node that went
offline and rejoined the cluster could potentially cause dangling indices to
be imported which should have been deleted, because when a node rejoins,
its previous cluster state does not contain reliable state.

This commit introduces the notion of index tombstones in the cluster
state, where we are explicit about which indices have been deleted.
In the case where the previous cluster state is not useful for index metadata
comparisons, a node now determines which indices are to be deleted based
on these tombstones in the cluster state. There is also functionality to
purge the tombstones after exceeding a certain amount.

Closes #16358
Closes #17435

abeyad · 2016-03-23T03:48:34Z

@bleskes @ywelsch Its a WIP, still have some functionality to implement and tests to write, but just in case you want to give a quick glance beforehand to see if its on the right track.

bleskes · 2016-03-23T08:47:58Z

core/src/main/java/org/elasticsearch/cluster/ClusterChangedEvent.java

+            IndexTombstone tombstone = cursor.value;
+            // we should only try to delete indices that have tombstones added since
+            // the last time we processed cluster state
+            if (tombstone.getClusterVersion() > previousVersion) {


I think we should compare the previous tombstone and the current one and generate a delta. Don't try to be overly smart.

bleskes · 2016-03-23T08:51:48Z

Hi @abeyad . Thanks for picking it up. I think this can be done in a single tombstone class which is basically a queue of deleted Index object. new entries are always added at the end. Trimming is always done at the beginning. Every time you add an entry the class automatically captures the current time (both in millis and in nanos) and add it to an internal key class. Internally we can assert semantics like "every index appears once". That class can also have methods to do trimming (both on time and size).

Does it make sense?

abeyad · 2016-03-23T13:49:17Z

@bleskes ++ on queue of deleted objects for ease of insertion and trimming from the front. The map made it easier to assert "every index appears once" semantics, but I can separate the internal representation from what is serialized.

Every time you add an entry the class automatically captures the current time (both in millis and in nanos) and add it to an internal key class.

I'm not clear on this - I figured we would need the current time on each entry (hence creating the IndexTombstone class to represent each entry). I'm not sure exactly what you mean by the adding current time to an internal key class.

s1monw · 2016-03-23T13:53:47Z

core/src/main/java/org/elasticsearch/cluster/metadata/IndexTombstone.java

+    private static final String INDEX_NAME_KEY = "indexName";
+    private static final String DELETE_DATE_KEY = "deleteDate";
+    private static final String CLUSTER_VERSION_KEY = "clusterVersion";
+    private static final ObjectParser<IndexTombstone.Builder, Void> TOMBSTONE_PARSER = new ObjectParser<>("indexTombstone");


s1monw · 2016-03-23T14:11:13Z

Hi @abeyad . Thanks for picking it up. I think this can be done in a single tombstone class which is basically a queue of deleted Index object. new entries are always added at the end. Trimming is always done at the beginning. Every time you add an entry the class automatically captures the current time (both in millis and in nanos) and add it to an internal key class. Internally we can assert semantics like "every index appears once". That class can also have methods to do trimming (both on time and size).

I think the current design is OK. It's really a value object and doesn't contain logic. It has the serializaiton and deserialization in there which is good. It can also implement comparable which is then taking the time into account. I also think we shouldn't mix datastructure that is on the clusterstate and representation.

Regarding a queue, I think we should just stick with a simple list we can sort once it's modified and ensure in the Clusterstate ctor that is in-fact sorted but keep it simple.

I also think we might even go without pruning in thirst PR and do the pruning as a followup? It can block a lot of good progress. There are a lot of open questions related to this and for how long we keep there tombstones, I think we should try to keep them for as long as possible but the question of how long is very hard to answer.

abeyad · 2016-03-28T16:17:10Z

core/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+                    indicesService.deleteClosedIndex("closed index no longer part of the metadata", metaData, event.state());
+                } else {
+                    indexSettings = null;
+                }


@bleskes I had to change the logic here, because in the case of a node restarting, its previous state will not contain the index metadata for an index that was deleted while it was offline, so if the index metadata for the deleted index (part of the tombstones in the cluster state) is not in the previous cluster state, I try to read it off disk. I had to introduce the MetaStateService as a dependency in this class in order to do that. If the index metadata could not also be read off of disk, that means the index was both created and deleted while the node was offline, so there is nothing to do.

I am unsure if this is the best approach, so would appreciate your feedback.

Yeah, I see what you mean. If the node is restarted while the index was deleted it may have no previous state. I think what you do is the right thing, but we can structure it in a slightly cleaner way:

if (idxService != null) { // delete in memory index deleteIndex(index, "index no longer part of the metadata"); // ackNodeIndexDeleted } else if (previousState.metaData().hasIndex(index)) { < --- needs a variant the checks a uuid // deleted index which wasn't assigned to local node (the closed index is very misleading below) indicesService.deleteClosedIndex("closed index no longer part of the metadata", metaData, event.state()); // ack the index deletion } else if (indicesService.canDeleteIndex(Index)) { <-- which should also checks for the folder existence like // load metadata from file and delete it }

wdyt?

That makes sense, I will formulate the logic in that manner.

Only issue with

else if (indicesService.canDeleteIndex(Index))

is that canDeleteIndexContents requires the IndexSettings, which can't be created until the IndexMetaData is loaded, so all that logic will need to go in an else block..

abeyad · 2016-03-28T16:27:51Z

@bleskes @s1monw This PR is ready for the next round of review. I left two specific comments that need special attention, please, as I was unsure of the proper route to take:
https://github.com/elastic/elasticsearch/pull/17265/files#r57590545
https://github.com/elastic/elasticsearch/pull/17265/files#r57592054

jasontedor · 2016-04-25T15:52:58Z

core/src/main/java/org/elasticsearch/cluster/metadata/IndexGraveyard.java

+    private final List<Tombstone> tombstones;
+
+    private IndexGraveyard(final List<Tombstone> list) {
+        tombstones = Collections.unmodifiableList(list);


I think that we want a null check here?

Its a private constructor, only called from the Builder, would an assert be more appropriate?

Its a private constructor, only called from the Builder, would an assert be more appropriate?

An assert is fine.

jasontedor · 2016-04-25T16:27:12Z

@abeyad I think that the high-level concepts have been ironed out, but I left some feedback on coding details.

jasontedor · 2016-04-25T17:05:49Z

core/src/main/java/org/elasticsearch/cluster/metadata/IndexGraveyard.java

@@ -175,6 +177,7 @@ public IndexGraveyard readFrom(final StreamInput in) throws IOException {
    final public static class Builder {
        private List<Tombstone> tombstones;
        private int numPurged = -1;
+        private long currentTime = System.currentTimeMillis();


Can it be final?

jasontedor · 2016-04-25T17:09:55Z

LGTM. Great work @abeyad.

abeyad · 2016-04-25T17:12:00Z

@jasontedor thank you for all the valuable feedback!

clintongormley · 2016-05-02T14:02:05Z

@abeyad looks like these settings still need to be documented?

abeyad added >enhancement review v5.0.0-alpha1 labels Mar 23, 2016

abeyad force-pushed the feature/tombstone-deleted-indices branch from 494bf97 to 44d8f05 Compare March 23, 2016 03:51

bleskes reviewed Mar 23, 2016
View reviewed changes

s1monw reviewed Mar 23, 2016
View reviewed changes

abeyad force-pushed the feature/tombstone-deleted-indices branch 7 times, most recently from 8d610f6 to e563dce Compare March 28, 2016 16:02

abeyad changed the title ~~WIP: Adds tombstones to cluster state for index deletions~~ Adds tombstones to cluster state for index deletions Mar 28, 2016

abeyad force-pushed the feature/tombstone-deleted-indices branch from e563dce to a38e12e Compare March 28, 2016 16:11

abeyad reviewed Mar 28, 2016
View reviewed changes

abeyad force-pushed the feature/tombstone-deleted-indices branch from a38e12e to 0eacfbf Compare March 29, 2016 20:45

jasontedor reviewed Apr 25, 2016
View reviewed changes

abeyad force-pushed the feature/tombstone-deleted-indices branch from 52d8adc to c71e1b6 Compare April 25, 2016 17:01

Addresses code review comments

751f5a8

abeyad force-pushed the feature/tombstone-deleted-indices branch from c71e1b6 to 751f5a8 Compare April 25, 2016 17:03

jasontedor reviewed Apr 25, 2016
View reviewed changes

Made a variable final

7e5b4a9

abeyad closed this in d39eb2d Apr 25, 2016

abeyad mentioned this pull request Jul 28, 2016

Restarting a crashed master node can cause deleted indices to show up as unassigned shards #19658

Closed

clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds tombstones to cluster state for index deletions #17265

Adds tombstones to cluster state for index deletions #17265

abeyad commented Mar 23, 2016

abeyad commented Mar 23, 2016

bleskes Mar 23, 2016

bleskes commented Mar 23, 2016

abeyad commented Mar 23, 2016

s1monw Mar 23, 2016

s1monw commented Mar 23, 2016

abeyad Mar 28, 2016

bleskes Mar 29, 2016

abeyad Mar 29, 2016

abeyad Mar 29, 2016

abeyad commented Mar 28, 2016

jasontedor Apr 25, 2016

abeyad Apr 25, 2016

jasontedor Apr 25, 2016 •

edited

Loading

jasontedor commented Apr 25, 2016

jasontedor Apr 25, 2016

jasontedor commented Apr 25, 2016

abeyad commented Apr 25, 2016

clintongormley commented May 2, 2016

Adds tombstones to cluster state for index deletions #17265

Adds tombstones to cluster state for index deletions #17265

Conversation

abeyad commented Mar 23, 2016

abeyad commented Mar 23, 2016

Choose a reason for hiding this comment

bleskes commented Mar 23, 2016

abeyad commented Mar 23, 2016

Choose a reason for hiding this comment

s1monw commented Mar 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad commented Mar 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor Apr 25, 2016 • edited Loading

Choose a reason for hiding this comment

jasontedor commented Apr 25, 2016

Choose a reason for hiding this comment

jasontedor commented Apr 25, 2016

abeyad commented Apr 25, 2016

clintongormley commented May 2, 2016

jasontedor Apr 25, 2016 •

edited

Loading