Snapshot of a searchable snapshot should be empty #66162

DaveCTurner · 2020-12-10T12:17:45Z

Today if you take a snapshot of a searchable snapshot index then we
treat it like a normal index and copy (any unchanged parts of) its
contents the the repository. This is often a complete copy, doubling the
snapshot storage required, since a searchable snapshot index typically
has a different name from the original index; it may also be that we are
taking a snapshot into a different repository. The content of a
searchable snapshot is already held in a snapshot, and its index
metadata indicates how to find this content, so it is wasteful to copy
anything new into the repository.

This commit adjusts things so that a searchable snapshot shard presents
itself to the snapshotter as if it contained no segments, and adjusts
things to account for the consequent mismatch at restore time.

Closes #66110

Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes elastic#66110

elasticmachine · 2020-12-10T12:17:49Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2020-12-10T12:20:16Z

I can't decide whether I think this is an elegant solution or an awful hack :) Probably best to assume the latter when reviewing it, and think about ways this might come back to haunt us in the future.

DaveCTurner · 2020-12-10T12:24:36Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

+                            final Directory directory = engineConfig.getStore().directory();
+                            final String oldestSegmentsFile = Arrays.stream(directory.listAll())
+                                .filter(s -> s.startsWith(IndexFileNames.SEGMENTS + "_"))
+                                .min(Comparator.naturalOrder())


Oops, this will almost always work except for cases where we add another character to the encoded generation (think "1" < "10" < "2"). I'll address this.

original-brownbear

I think if we want to go for a hack this is the shortest possible one pretty much :)
I'm +1 to this approach, it's somewhat similar to how we hack around things in the recovery and I like how it's pretty risk free by not changing anything of substance about the snapshot process.

Just one open question + tests seem to fail

server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SnapshotFiles.java

...hable-snapshots/src/main/java/org/elasticsearch/index/store/InMemoryNoOpCommitDirectory.java

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

original-brownbear

LGTM thanks David!

henningandersen

Did an initial read and have a couple of suggestions.

server/src/main/java/org/elasticsearch/repositories/blobstore/FileRestoreContext.java

henningandersen · 2020-12-10T13:13:26Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

+                    @Override
+                    public IndexCommitRef acquireIndexCommitForSnapshot() throws EngineException {
+                        store.incRef();
+                        return new IndexCommitRef(emptyIndexCommit, store::decRef);


If we can skip the restore completely (see previous comment), could we then instead return null here and also skip the snapshotting completely?

Not trivially, no, we want the snapshot of this shard to succeed (without having done much) so we can't just bail out, and we use a null commit to indicate "find the latest commit" further down the line.

DaveCTurner · 2020-12-10T15:56:35Z

I'm not merging this quite yet because it doesn't account for frozen indices: in the frozen case, we don't use the ReadOnlyEngine with the customisations introduced here.

DaveCTurner · 2020-12-10T16:45:15Z

darn it, also doesn't account for closed indices, which we hard-code to have a NoOpEngine that supports proper snapshotting 😢

needs more work for the frozen/closed case

…ngine impl

DaveCTurner · 2020-12-14T09:58:57Z

I have adjusted how this PR injects the searchable-snapshots-specific snapshotting behaviour to avoid having it depend on the engine implementation directly, since we cannot control this in the frozen/closed case.

I've elected to re-use the index.store.type setting for this customisation rather than to introduce a new setting.

henningandersen

LGTM.

DaveCTurner · 2020-12-14T18:15:03Z

@elasticmachine please run elasticsearch-ci/2

Process 'Gradle Test Executor 400' finished with non-zero exit value 1

Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes #66110

* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...

DaveCTurner added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.11.0 labels Dec 10, 2020

DaveCTurner requested review from tlrx and original-brownbear December 10, 2020 12:17

elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Dec 10, 2020

DaveCTurner requested a review from henningandersen December 10, 2020 12:17

DaveCTurner commented Dec 10, 2020

View reviewed changes

Sort the segments files properly

cc3403a

original-brownbear reviewed Dec 10, 2020

View reviewed changes

DaveCTurner added 4 commits December 10, 2020 12:49

Fix test that already tested this path

68971b5

Add comment on why we futz around with generations

a04c04c

Assert we don't overwrite the existing segments_N file

b0e3567

More fields and less toString in toString

63383c7

original-brownbear previously approved these changes Dec 10, 2020

View reviewed changes

Spotless

ea23c8d

henningandersen reviewed Dec 10, 2020

View reviewed changes

Extract method for constructing empty index commit

f0ef505

Add REST test for snapshotting a searchable snapshot

794e935

DaveCTurner added the WIP label Dec 10, 2020

DaveCTurner added 2 commits December 14, 2020 08:01

Merge branch 'master' into 2020-12-09-snapshotting-searchable-snapshots

9d9c000

Allow plugins to customise the snapshot commit independently of the e…

0018295

…ngine impl

DaveCTurner requested a review from original-brownbear December 14, 2020 09:56

DaveCTurner requested a review from henningandersen December 14, 2020 09:56

henningandersen approved these changes Dec 14, 2020

View reviewed changes

Merge branch 'master' into 2020-12-09-snapshotting-searchable-snapshots

50eb788

DaveCTurner merged commit 69e5ea1 into elastic:master Dec 14, 2020

DaveCTurner removed the WIP label Feb 1, 2021

DaveCTurner deleted the 2020-12-09-snapshotting-searchable-snapshots branch February 1, 2021 09:23

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshot of a searchable snapshot should be empty #66162

Snapshot of a searchable snapshot should be empty #66162

DaveCTurner commented Dec 10, 2020

elasticmachine commented Dec 10, 2020

DaveCTurner commented Dec 10, 2020

DaveCTurner Dec 10, 2020

original-brownbear left a comment •

edited

Loading

original-brownbear left a comment

henningandersen left a comment

henningandersen Dec 10, 2020

DaveCTurner Dec 10, 2020

DaveCTurner commented Dec 10, 2020

DaveCTurner commented Dec 10, 2020

DaveCTurner commented Dec 14, 2020

henningandersen left a comment

DaveCTurner commented Dec 14, 2020

Snapshot of a searchable snapshot should be empty #66162

Snapshot of a searchable snapshot should be empty #66162

Conversation

DaveCTurner commented Dec 10, 2020

elasticmachine commented Dec 10, 2020

DaveCTurner commented Dec 10, 2020

DaveCTurner Dec 10, 2020

Choose a reason for hiding this comment

original-brownbear left a comment • edited Loading

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Dec 10, 2020

Choose a reason for hiding this comment

DaveCTurner Dec 10, 2020

Choose a reason for hiding this comment

DaveCTurner commented Dec 10, 2020

DaveCTurner commented Dec 10, 2020

DaveCTurner commented Dec 14, 2020

henningandersen left a comment

Choose a reason for hiding this comment

DaveCTurner commented Dec 14, 2020

original-brownbear left a comment •

edited

Loading