Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry point in time on other copy when possible #66713

Merged
merged 6 commits into from
Jan 9, 2021

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 21, 2020

Relates #61062

@dnhatn dnhatn requested a review from jimczi January 6, 2021 17:45
@dnhatn dnhatn added :Search/Search Search-related issues that do not fall into other categories >enhancement v8.0.0 v7.12.0 labels Jan 6, 2021
@dnhatn dnhatn marked this pull request as ready for review January 6, 2021 17:46
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)


// TODO: Remove this constructor
public ShardSearchContextId(String sessionId, long id) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this constructor in a follow-up.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach.
I left one comment regarding the deletion of the PIT.
We should lookup all active replicas when removing the contexts.
I am also concerned by the fact that we always keep the latest failure for each shard.
Now that we add replicas in the context, we should ensure that we keep the original exception when we retry on a replica. Otherwise any exception could be hidden by the search context missing exception that the other replicas would send. Does that makes sense ?

@dnhatn
Copy link
Member Author

dnhatn commented Jan 7, 2021

@jimczi Thanks for reviewing.

We should lookup all active replicas when removing the contexts.

I think we handle this in https://github.com/elastic/elasticsearch/pull/66713/files#diff-5198131d51dc75f4b8b82d7cc9a3960b0955bbd50c7b6573bdc5a227f02d12cfR213 and https://github.com/elastic/elasticsearch/pull/66713/files#diff-c547f7891956be1109085c486b76f18044af4b96551b46ed3aba5cf82c044158R103

I am also concerned by the fact that we always keep the latest failure for each shard.
Now that we add replicas in the context, we should ensure that we keep the original exception when we retry on a replica. Otherwise any exception could be hidden by the search context missing exception that the other replicas would send.

I think it's implemented here:
https://github.com/elastic/elasticsearch/pull/66713/files#diff-96633e6da7d4c7696cf8d66d892f86fab7e407f42b45bb9d1df9b6b92b1270d5R484

@jimczi
Copy link
Contributor

jimczi commented Jan 7, 2021

I think we handle this in https://github.com/elastic/elasticsearch/pull/66713/files#diff-5198131d51dc75f4b8b82d7cc9a3960b0955bbd50c7b6573bdc5a227f02d12cfR213 and https://github.com/elastic/elasticsearch/pull/66713/files#diff-c547f7891956be1109085c486b76f18044af4b96551b46ed3aba5cf82c044158R103

Yep sorry I should have been more precise. If we delete the extra context after each search, we loose the protection that we set when the context is opened on the shard. Maybe that's not important for searchable snapshots so the workaround that you implemented is ok but I am not sure :(.

I think it's implemented here:
https://github.com/elastic/elasticsearch/pull/66713/files#diff-96633e6da7d4c7696cf8d66d892f86fab7e407f42b45bb9d1df9b6b92b1270d5R484

Nice, sorry I missed it.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking more about the options we have to take replicas into account and I think your solution is simpler and less intrusive. We can revisit in the future but that sounds enough for now especially knowing that it's a solution for frozen indices and searchable snapshots only. So +1 to merge this solution, sorry for the back and forth.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 7, 2021

I indeed explored two options before choosing the current approach. I found the other option more compelling than the current one as it truly replaces unavailable contexts with the new ones. However, it's not easy to clean up those new contexts properly. I will merge this PR as is, and we can revisit this as you said. Thank you for the feedback.

@dnhatn dnhatn merged commit 59082c0 into elastic:master Jan 9, 2021
@dnhatn dnhatn deleted the retry-searcher-id branch January 9, 2021 14:44
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 10, 2021
dnhatn added a commit that referenced this pull request Jan 11, 2021
dnhatn added a commit that referenced this pull request Jan 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants