-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Close Index API] Add unique UUID to ClusterBlock #36775
[Close Index API] Add unique UUID to ClusterBlock #36775
Conversation
Pinging @elastic/es-distributed |
...java/org/elasticsearch/action/admin/indices/close/TransportVerifyShardBeforeCloseAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/block/ClusterBlocks.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/block/ClusterBlocks.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/block/ClusterBlocks.java
Outdated
Show resolved
Hide resolved
if (currentState.blocks().hasIndexBlock(index.getName(), INDEX_CLOSED_BLOCK) == false) { | ||
blocks.addIndexBlock(index.getName(), INDEX_CLOSED_BLOCK); | ||
} | ||
blocks.addIndexBlock(index.getName(), closingBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the index already has a close block? Do we keep on adding these blocks for every close request? Is there a risk here for the cluster state to become very large in case of many close requests?
Should we replace the current close block by a new one or perhaps limit the number of close blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum. I think that we should always keep the last index closed block only and replace the current close block by a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have different options here, with different tradeoffs.
- when starting a close on an index that is not yet fully closed, replace the current closed block by a new one. That means that the possibly ongoing close action cannot successfully complete and the new one will be the one to successfully complete. This is problematic because concurrent calls to close will now make them fail.
- add a new closed block next to the already existing one. Opening will erase all closed blocks. That allows concurrent closes to successfully complete, but possibly adds many blocks to cluster state.
- reuse the existing closed block if there is already one. This combines the advantages of both solutions above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* reuse the existing closed block if there is already one. This combines the advantages of both solutions above.
It is indeed more elegant but if multiple close actions share the same unique block the actions become kind of dependant of each others, no? If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why the block carries a unique id is to make sure that you can distinguish the
add close block -> moved index state to closed
from the add close block -> open index (removes close block) -> add close block (from another call to close) -> moved index state to closed (from the original close block call)
situation.
If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.
Things might be simpler if we were to never automatically clean up these blocks. In case where the block is put in place, the system should do everything it can to see the closing to completion, and not go back and reopen the index.
try { | ||
final IndexMetaData indexMetaData = metadata.getSafe(index); | ||
if (indexMetaData.getState() != IndexMetaData.State.CLOSE) { | ||
if (result.getValue().isAcknowledged()) { | ||
assert currentState.blocks().hasIndexBlock(index.getName(), closingBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we know that the close block has not been removed? An open command could have come in in the mean-time, removing the close block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic here to accommodate with keeping only the last index closed block.
An open command could have come in in the mean-time, removing the close block?
Opening indices only removes blocks and changes state of indices that are not OPEN so I think we're good and block should not be removed by an open command.
@ywelsch I've updated the code. Can you have another look please? |
if (indexMetaData.getState() != IndexMetaData.State.CLOSE) { | ||
if (result.getValue().isAcknowledged()) { | ||
logger.debug("closing index {} succeed, removing index routing table", index); | ||
if (acknowledged) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make this logic easy to understand but we can maybe do better, any suggestion welcome.
I also think that if we go with always keeping a single block in cluster state we should compute the AcknowledgedResponse by diffing the current state and the updated state.
@ywelsch I updated the code according to our last discussion:
I also added a bit more tests. Let me know what you think! Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few more comments. The logic looks better.
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java
Outdated
Show resolved
Hide resolved
final ClusterBlock closingBlock = blockedIndices.get(index); | ||
if (currentState.blocks().hasIndexBlock(index.getName(), closingBlock) == false) { | ||
logger.debug("closing index {} succeed but block has been removed in the mean time", index); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why has it succeeded here? The index is not closed and the block has disappeared, so it must have been reopened in the meanwhile. This should be counted as a failure, not a success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the message in the log is confusing. What succeeded here is the verification of shards using the block that has now disappeared. I changed to change the log message.
This should be counted as a failure, not a success.
I agree. I pushed a change that combines the results of the TransportVerifyShardBeforeCloseAction and the cluster state changes in order to determine if the current close action has indeed closed the index.
server/src/test/java/org/elasticsearch/indices/state/ReopenWhileClosingIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/state/ReopenWhileClosingIT.java
Outdated
Show resolved
Hide resolved
assertAcked(client().admin().indices().prepareOpen(reopenedIndices.toArray(Strings.EMPTY_ARRAY))); | ||
|
||
releaseBlock.close(); | ||
closeIndexResponse.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this throw an exception / convey a failed closing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The close action should not be acknowledged so we can check that.
When we'll improve the Close Index API response we can decide if we want to throw an exception or report an error message. For now I think we should not change the API and just not ack the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can we assert that acked == false here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been changed to assertFalse(closeIndexResponse.get().isAcknowledged());
, I think we're good
@ywelsch Thanks a lot! I've updated the code again. |
// Create a new index closed block | ||
indexBlock = createIndexClosedBlock(); | ||
indexBlock = createIndexClosingBlock(); | ||
assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assertion should also cover the case where indexBlock
was not previously null, i.e., rewrite this as:
} else {
if (indexBlock == null) {
indexBlock = createIndexClosingBlock();
}
assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
if (result.getValue().isAcknowledged()) { | ||
if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) { | ||
IndexMetaData previousMetaData = currentState.metaData().index(result.getKey()); | ||
if (previousMetaData != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how could this ever be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't, I tend to be too defensive on NPEs.
if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) { | ||
IndexMetaData previousMetaData = currentState.metaData().index(result.getKey()); | ||
if (previousMetaData != null) { | ||
acknowledged = (previousMetaData.getState() == IndexMetaData.State.OPEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow this logic here. Wouldn't it be much simpler just to say. If the index is closed at this point, all is good. If not, something must have gone wrong, thereby acked == false.
This amounts to the following simple logic:
for (Map.Entry<Index, AcknowledgedResponse> result : results.entrySet()) {
IndexMetaData updatedMetaData = updatedState.metaData().index(result.getKey());
if (updatedMetaData != null && updatedMetaData.getState() != IndexMetaData.State.CLOSE) {
acknowledged = false;
break;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's do that.
As suggested in #36775, this pull request renames the following methods: ClusterBlocks.hasGlobalBlock(int) ClusterBlocks.hasGlobalBlock(RestStatus) ClusterBlocks.hasGlobalBlock(ClusterBlockLevel) to something that better reflects the property of the ClusterBlock that is searched for: ClusterBlocks.hasGlobalBlockWithId(int) ClusterBlocks.hasGlobalBlockWithStatus(RestStatus) ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
As suggested in #36775, this pull request renames the following methods: ClusterBlocks.hasGlobalBlock(int) ClusterBlocks.hasGlobalBlock(RestStatus) ClusterBlocks.hasGlobalBlock(ClusterBlockLevel) to something that better reflects the property of the ClusterBlock that is searched for: ClusterBlocks.hasGlobalBlockWithId(int) ClusterBlocks.hasGlobalBlockWithStatus(RestStatus) ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
Thanks @ywelsch for the review. I fixed conflicts and updated the code. Let me know if you have more comments or concerns. Thanks |
…of indices to reopen)
I pushed a fix for a test bug in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @ywelsch, and sorry for the numerous comments you had to make. |
This commit backports to 6.x of the Close Index API refactoring. It cherry-picks the following commits from master: 3ca885e [Close Index API] Add TransportShardCloseAction for pre-closing verifications (#36249) 8e5dd20 [Close Index API] Refactor MetaDataIndexStateService (#36354) 7372529 [Tests] Reduce randomization in CloseWhileRelocatingShardsIT (#36694) 103c4d4 [Close Index API] Mark unavailable shard copy as stale during verification (#36755) 1959388 [Close Index API] Propagate tasks ids between Freeze, Close and Verify(#36630) e149b08 [Close Index API] Add unique UUID to ClusterBlock (#36775) dc371ef [Tests] Fix ReopenWhileClosingIT with correct min num shards The following two commits were needed to adapt the change to 6.x: ef6ae69 [Close Index API] Adapt MetaDataIndexStateServiceTests after merge 21b7653 [Tests] Adapt CloseIndexIT tests for 6.x Related to #33888
Note: this pull request will be merged in the close-index-api-refactoring branch
This pull request add a unique id to cluster blocks, so that they can be uniquely identified if needed. This is important for the Close Index API where multiple concurrent closing requests can be executed at the same time. By adding a UUID to cluster block, we can generate unique "index closed" blocks that can be verified on shards and then checked again from the cluster state before closing the index.