Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in replica promotion for a closed remote translog enabled index while writing Integ tests #8485

Open
ashking94 opened this issue Jul 6, 2023 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework

Comments

@ashking94
Copy link
Member

Is your feature request related to a problem? Please describe.
While writing tests for recovery flow as part of #8476, discovered that NoOpEngine is created when the index is closed. In this test, I am creating a remote translog enabled with 0 replica, index some docs, then closing the index. Post that, updating index settings to increase replica count to 1. Then stopping the node which has the

private EngineFactory getEngineFactory(final IndexSettings idxSettings) {
final IndexMetadata indexMetadata = idxSettings.getIndexMetadata();
if (indexMetadata != null && indexMetadata.getState() == IndexMetadata.State.CLOSE) {
// NoOpEngine takes precedence as long as the index is closed
return NoOpEngine::new;
}

During replica to primary promotion, below method is hit on the replica shard -

public void updateShardState(
final ShardRouting newRouting,
final long newPrimaryTerm,
final BiConsumer<IndexShard, ActionListener<ResyncTask>> primaryReplicaSyncer,
final long applyingClusterStateVersion,
final Set<String> inSyncAllocationIds,
final IndexShardRoutingTable routingTable

At the end of this method, the control reaches toturnOffTranslogRetention(); -
https://github.com/opensearch-project/OpenSearch/blob/c1c23b42e335bb337c668c9cf9dccd8b71dfdbab/server/src/main/java/org/opensearch/index/shard/IndexShard.java#L748C4-L759

This ultimately hits the below assertion which fails -

translog.trimUnreferencedReaders();
// refresh the translog stats
translogStats = translog.stats();
assert translog.currentFileGeneration() == translog.getMinFileGeneration() : "translog was not trimmed "
+ " current gen "
+ translog.currentFileGeneration()
+ " != min gen "
+ translog.getMinFileGeneration();

Describe the solution you'd like
Need to think over on this.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@ashking94 ashking94 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 6, 2023
@anasalkouz anasalkouz added Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants