-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add listener for background lease sync completion #39240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add listener for background lease sync completion #39240
Conversation
Today the background sync of retention leases is a fire-and-forget action, and therefore so is `IndexShard#syncRetentionLeases()`. However, in tests sometimes we want to sync the retention leases and then wait for its completion before continuing, and today this wait has to be a busy-wait. This change allows us to listen for completion instead.
|
Pinging @elastic/es-distributed |
|
This change is groundwork for the introduction of peer-recovery retention leases: if history retention is to be coordinated by the primary we must explicitly sync the leases to release history in a reasonable number of other tests. I.e. relates #39133. |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question but this looks good.
|
|
||
| private void syncRetentionLeases() { | ||
| sync(IndexShard::syncRetentionLeases, "retention lease"); | ||
| sync(primary -> primary.syncRetentionLeases(wrap(() -> {})), "retention lease"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just use ActionListener.wrap without a static import?
| * @param listener the callback when the retention lease is successfully removed and synced to replicas | ||
| */ | ||
| public void removeRetentionLease(final String id, final ActionListener<ReplicationResponse> listener) { | ||
| public void removeRetentionLease(final String id, final ActionListener<Void> listener) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we are not really using the response but is there any reason for this change? Is it okay if we make both sync and backgroundSync return ReplicationResponse instead of Void?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primarily because I think RetentionLeaseSyncer.EMPTY should notify the listener (the contract for a method that accepts an ActionListener is that it is eventually notified, which today we do not satisfy) and yet that implementation does not have a ReplicationResponse with which to notify it. I could make one up, but it felt wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move RetentionLeaseSyncer.EMPTY to tests and make it notify listeners with an empty ReplicationResponse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I reverted back to using a made-up ReplicationResponse in 50acab7; I think I'll leave RetentionLeaseSyncer.EMPTY where it is.
| } | ||
| futures.forEach(f -> { | ||
| try { | ||
| f.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe use actionGet.
| group.addRetentionLease(Integer.toString(i), randomNonNegativeLong(), "test-" + i, addLeaseFuture); | ||
| RetentionLeaseSyncAction.Request request = ((SyncRetentionLeasesResponse) addLeaseFuture.actionGet()).syncRequest; | ||
| final IndexShard newPrimary = randomFrom(group.getReplicas()); | ||
| final AtomicReference<RetentionLeases> latestRetentionLeasesOnNewPrimary = new AtomicReference<>(RetentionLeases.EMPTY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test becomes a bit more tangled since we change to a Void listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. I nearly backed out the whole idea because of this test. OTOH it has less casting after this change.
jasontedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change to production code to support testing. Sometimes that is the right thing. Yet, I am not following enough the thinking to make this change. Can you explain more why assert busy is not enough for these cases? Since in production these things are only going to eventually happen, I feel like our tests should reflect that rather than making them artificial with an actual sync point that would never exist in production code?
I am marking this as "Request changes" only to be clear that I would like to have a better understanding for this change before it is merged.
|
We have a number of tests that perform some other "happens eventually" actions in a synchronous fashion (refreshes/flushes) and then assert that history has been discarded or segments have been deleted. Although technically we're asserting that history has been discarded only if the user performed an explicit refresh/flush in those cases, what we're trying to ensure is really that history is discarded eventually even without that explicit action. I think this is a reasonable extrapolation, but if we're not happy with this then we should consider moving these tests to I am not a huge fan of |
|
@elasticmachine please run elasticsearch-ci/docbldesx |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks @DaveCTurner.
@jasontedor Are you okay with this change?
|
|
||
| // now ensure that the retention leases are the same | ||
| assertBusy(() -> { | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to call syncRetentionLeases or keep assertBusy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's sufficient not to do so because we are asserting that leases were not renewed, so checking the primary is enough. But it does not cause any harm, so I added an explicit sync in fc4459e.
|
@jasontedor it'd be good to resolve this one way or another - could you let me know your thoughts? |
|
@DaveCTurner I still feel how I felt when I left my initial comment. That is, I understand this cleans up some As I went back and read this thread again, and read the implementation, I saw a comment that I missed before.
I am not following this, can you elaborate? |
|
On reflection this isn't obviously necessary for peer recovery retention leases: I think I can use |
Today the background sync of retention leases is a fire-and-forget action, and
therefore so is
IndexShard#syncRetentionLeases(). However, in tests sometimeswe want to sync the retention leases and then wait for its completion before
continuing, and today this wait has to be a busy-wait. This change allows us to
listen for completion instead.