-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain history for peer recovery using leases #39133
Retain history for peer recovery using leases #39133
Conversation
Today if soft deletes are enabled then we discard history below the global checkpoint, damaging our chances of being able to perform an operations-based peer recovery. This change has each shard copy obtain a history retention lease to retain history that is not included in its local safe commit, making an operations-based peer recovery much more likely.
This PR is WIP and represents the current state of my proof-of-concept. I do not antipicate merging it as-is since it is too large, but I would like to gather feedback to be sure I'm heading in the right direction here. I would particularly like opinions about the changes to The goal is that we can perform an operations-based recovery for all "reasonable" shard copies C:
Reasonable shard copies comprise all the copies that are currently being tracked, as well as all the copies that "might be a recovery target": if the shard is not fully allocated then any copy that has been tracked in the last We also require that history is eventually released: in a stable cluster, for every operation with seqno s below the MSN of a replication group, eventually there are no leases that retain s:
There are a few conceptually-separate pieces combined into this one rather large change:
Followup work, not included here.
BWC issues: during a rolling upgrade, we may migrate a primary onto a new node without first establishing the appropriate leases. They can't be established before or during this promotion, so we must weaken the assertions so that they only apply to sufficiently-newly-created indices, and weaken our requirements for ops-based peer recovery to be a "best effort" thing on older indices. We will still establish leases properly during peer recovery, and can establish them lazily on older indices, but they may not retain all the right history when first created. Closed replicated indices issues: a closed index permits no replicated actions, but should not need any history to be retained. We cannot replay history into a closed index, so all recoveries must be file-based, so there's no real need for leases. We cannot assert that all the copies of a replicated closed index have a corresponding lease. Maybe we should remove any peer-recovery retention leases when closing an index? Or maybe we should let them expire naturally. Plan:
Open questions:
|
Pinging @elastic/es-distributed |
Thanks @DaveCTurner. I have some points to discuss on the approach.
|
Hi @dnhatn
I don't fully understand what you mean here, although moving functionality around is definitely possible in principle. Are you suggesting getting/adding the peer' history retention lease via transport actions rather than by talking directly to the primary
I think it's important that we don't expire the retention leases of active shards, even if something weird like a clock discontinuity occurs, but the leases corresponding to inactive shards are already treated as any other retention lease and expire after
I think you're right (but I need to think about this more to be certain). However if I understand correctly this is not what we do today, right? On the positive side we already share the global checkpoint, so there would be no need for the periodic task introduced here. However we still need to retain history from the local checkpoint of the local shard copy's safe commit (I was hoping to be able to express history retention entirely in terms of leases) and I think we might want to do more initial coordination between source and target to establish whether the recovery will be file-based or operations-based before expending effort on replaying the local translog.
I don't think so, because then there would be no peer recovery retention lease after a cluster-wide disaster (e.g. power outage) with the consequence that many peer recoveries after the outage would have to fall back to copying files. |
Today this test catches an exception and asserts that its proximate cause has message `Random IOException` but occasionally this exception is wrapped two layers deep, causing the test to fail. This commit adjusts the test to look at the root cause of the exception instead. 1> [2019-02-25T12:31:50,837][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotFileFailureDuringSnapshot] --> caught a top level exception, asserting what's expected 1> org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] Snapshot could not be read 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:212) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135) ~[main/:?] 1> at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127) ~[main/:?] 1> at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[main/:?] 1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?] 1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202] 1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202] 1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202] 1> Caused by: org.elasticsearch.snapshots.SnapshotException: [test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:564) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more 1> Caused by: java.io.IOException: Random IOException 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.maybeIOExceptionOrBlock(MockRepository.java:275) ~[test/:?] 1> at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.readBlob(MockRepository.java:317) ~[test/:?] 1> at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90) ~[main/:?] 1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:560) ~[main/:?] 1> at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:206) ~[main/:?] 1> ... 9 more FAILURE 0.59s J0 | SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot <<< FAILURES! > Throwable #1: java.lang.AssertionError: > Expected: a string containing "Random IOException" > but: was "[test-repo:test-snap/e-hn_pLGRmOo97ENEXdQMQ] failed to get snapshots" > at __randomizedtesting.SeedInfo.seed([B73CA847D4B4F52D:884E042D2D899330]:0) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testSnapshotFileFailureDuringSnapshot(SharedClusterSnapshotRestoreIT.java:821) > at java.lang.Thread.run(Thread.java:748)
…ause' into 2019-02-19-soft-deletes-history-retention-TMP
In a rolling upgrade from a version that pre-dates the introduction of peer-recovery retention leases, eventually the primary lands on an upgraded node. If this was a relocation or a promotion then it inherits the replicas of the previous primary without performing peer recoveries, so there may not be peer recovery retention leases for the other shard copies. This change fixes this by adding leases as-needed on older indices.
Closing in favour of meta-issue #41536 |
Today if soft deletes are enabled then we discard history below the global
checkpoint, damaging our chances of being able to perform an operations-based
peer recovery. This change has each shard copy obtain a history retention lease
to retain history that is not included in its local safe commit, making an
operations-based peer recovery much more likely.