Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Fix SimpleBlocksIT testConcurrentAddBlock #123019

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Feb 20, 2025

Check that requests for adding an index block in parallel are acked. We already use the same pattern in the SimpleBlocksIT#testCloseOneMissingIndexIgnoreMissing test

Resolve #122324

@arteam arteam added >test Issues or PRs that are addressing/adding tests WIP :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Feb 20, 2025
@arteam arteam changed the title [test] Check that requests for adding an index block in parallel are acked [test] Fix SimpleBlocksIT testConcurrentAddBlock Feb 20, 2025
@arteam arteam removed the WIP label Feb 20, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Feb 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced this is the right fix. If the changes we made recently cause the index block to sometimes not be added, we need to fix that. Whether it is ack'ed seems irrelevant here.

I wonder how you arrived at the conclusion that this was the necessary fix?

Where you able to reproduce the issue? And did that reproduction no longer reproduce afterwards? If you could reproduce it, then how?

@arteam
Copy link
Contributor Author

arteam commented Feb 28, 2025

@henningandersen The issue is reproduced very easily locally if I check whether prepareAddBlock call is acked

check-ack-prepare-add-block

I've used the same pattern to for retrying of adding the block as in testCloseOneMissingIndexIgnoreMissing test.

@arteam
Copy link
Contributor Author

arteam commented Feb 28, 2025

Failing index block responses look something like this

Adding block response {"acknowledged":false,"shards_acknowledged":false,"indices":[{"name":"ypqeperjvj","failed_shards":[{"id":"0","failures":[{"shard":0,"index":"ypqeperjvj","status":"INTERNAL_SERVER_ERROR","reason":{"type":"illegal_state_exception","reason":"index shard [ypqeperjvj][0] has not applied block 8,MZC4iSBuSUaJw5XaURexLA,moving to block index write (api), blocks WRITE"}}]}]}]}
Adding block response {"acknowledged":false,"shards_acknowledged":false,"indices":[{"name":"ypqeperjvj","failed_shards":[{"id":"0","failures":[{"node":"CUC650bHQC-vIA3RZjxKhw","shard":0,"index":"ypqeperjvj","status":"INTERNAL_SERVER_ERROR","reason":{"type":"illegal_state_exception","reason":"index shard [ypqeperjvj][0] has not applied block 8,MZC4iSBuSUaJw5XaURexLA,moving to block index write (api), blocks WRITE"}}]}]}]}

@henningandersen
Copy link
Contributor

I see, thanks. I still think I'd prefer to fix the API to actually add the block. This seems to be an internal race condition more than an unsolvable problem. I can see 2 ways to do so:

  1. Wait for the first add block to be acknowledged (in MetadataIndexStateService).
  2. Wait for the right cluster state version to appear on the node holding the shard.

We should probably do so for both add-index and close-index, but can start with just add-index.

@arteam
Copy link
Contributor Author

arteam commented Feb 28, 2025

It looks like the exception that is thrown was introduced in c416f25 as part adding for project_id-aware based cluster block checks.

Scratch that, the check existed in the original PR #58094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SimpleBlocksIT testConcurrentAddBlock failing
3 participants