-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot of a searchable snapshot should be empty #66162
Snapshot of a searchable snapshot should be empty #66162
Conversation
Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes elastic#66110
Pinging @elastic/es-distributed (Team:Distributed) |
I can't decide whether I think this is an elegant solution or an awful hack :) Probably best to assume the latter when reviewing it, and think about ways this might come back to haunt us in the future. |
final Directory directory = engineConfig.getStore().directory(); | ||
final String oldestSegmentsFile = Arrays.stream(directory.listAll()) | ||
.filter(s -> s.startsWith(IndexFileNames.SEGMENTS + "_")) | ||
.min(Comparator.naturalOrder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this will almost always work except for cases where we add another character to the encoded generation (think "1" < "10" < "2"
). I'll address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we want to go for a hack this is the shortest possible one pretty much :)
I'm +1 to this approach, it's somewhat similar to how we hack around things in the recovery and I like how it's pretty risk free by not changing anything of substance about the snapshot process.
Just one open question + tests seem to fail
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SnapshotFiles.java
Outdated
Show resolved
Hide resolved
...hable-snapshots/src/main/java/org/elasticsearch/index/store/InMemoryNoOpCommitDirectory.java
Show resolved
Hide resolved
...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks David!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did an initial read and have a couple of suggestions.
server/src/main/java/org/elasticsearch/repositories/blobstore/FileRestoreContext.java
Show resolved
Hide resolved
@Override | ||
public IndexCommitRef acquireIndexCommitForSnapshot() throws EngineException { | ||
store.incRef(); | ||
return new IndexCommitRef(emptyIndexCommit, store::decRef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can skip the restore completely (see previous comment), could we then instead return null here and also skip the snapshotting completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not trivially, no, we want the snapshot of this shard to succeed (without having done much) so we can't just bail out, and we use a null
commit to indicate "find the latest commit" further down the line.
I'm not merging this quite yet because it doesn't account for frozen indices: in the frozen case, we don't use the |
darn it, also doesn't account for closed indices, which we hard-code to have a |
needs more work for the frozen/closed case
I have adjusted how this PR injects the searchable-snapshots-specific snapshotting behaviour to avoid having it depend on the engine implementation directly, since we cannot control this in the frozen/closed case. I've elected to re-use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@elasticmachine please run elasticsearch-ci/2
|
Today if you take a snapshot of a searchable snapshot index then we treat it like a normal index and copy (any unchanged parts of) its contents the the repository. This is often a complete copy, doubling the snapshot storage required, since a searchable snapshot index typically has a different name from the original index; it may also be that we are taking a snapshot into a different repository. The content of a searchable snapshot is already held in a snapshot, and its index metadata indicates how to find this content, so it is wasteful to copy anything new into the repository. This commit adjusts things so that a searchable snapshot shard presents itself to the snapshotter as if it contained no segments, and adjusts things to account for the consequent mismatch at restore time. Closes #66110
* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...
Today if you take a snapshot of a searchable snapshot index then we
treat it like a normal index and copy (any unchanged parts of) its
contents the the repository. This is often a complete copy, doubling the
snapshot storage required, since a searchable snapshot index typically
has a different name from the original index; it may also be that we are
taking a snapshot into a different repository. The content of a
searchable snapshot is already held in a snapshot, and its index
metadata indicates how to find this content, so it is wasteful to copy
anything new into the repository.
This commit adjusts things so that a searchable snapshot shard presents
itself to the snapshotter as if it contained no segments, and adjusts
things to account for the consequent mismatch at restore time.
Closes #66110