Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix weighted shard routing state across search requests #6004

Merged

Conversation

anshu1106
Copy link
Contributor

@anshu1106 anshu1106 commented Jan 25, 2023

Signed-off-by: Anshu Agarwal anshukag@amazon.com

Description

This PR fixes issue due to which state is not maintained across weighted shard routing search requests . The shuffler is moved twice in a call which is causing the issue. The PR adds code logic to prevent unwanted shuffler movement.

Issues Resolved

#6056

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 79 to 80
final ShardShuffler shuffler;
final ShardShuffler shufflerForWeightedRouting;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a second shuffler or just creating another method the prevents the shuffle for onlyNodeSelectorActiveInitializingShardsIt should work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with this line https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java#L587. This is moving the shuffler one more time ie in a three node cluster with weights (1,1,0) the second request also hits the same node since the shuffler moves by 2 in the first request.

Since we need onlyNodeSelectorActiveInitializingShardsIt to get shards belonging to node with weight zero. Introducing a method without shuffler will require a code duplication.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put up a comment explaining the same.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2023

Codecov Report

Merging #6004 (296a62e) into main (715ff72) will increase coverage by 0.05%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6004      +/-   ##
============================================
+ Coverage     70.75%   70.81%   +0.05%     
- Complexity    58720    58738      +18     
============================================
  Files          4771     4771              
  Lines        280818   280819       +1     
  Branches      40568    40568              
============================================
+ Hits         198704   198860     +156     
+ Misses        65824    65630     -194     
- Partials      16290    16329      +39     
Impacted Files Coverage Δ
...search/cluster/routing/IndexShardRoutingTable.java 86.43% <100.00%> (-1.48%) ⬇️
...java/org/opensearch/client/indices/DataStream.java 0.00% <0.00%> (-76.09%) ⬇️
.../opensearch/client/indices/CloseIndexResponse.java 17.50% <0.00%> (-73.75%) ⬇️
...luster/routing/allocation/RoutingExplanations.java 41.37% <0.00%> (-58.63%) ⬇️
...ava/org/opensearch/action/NoSuchNodeException.java 0.00% <0.00%> (-50.00%) ⬇️
...ch/transport/ReceiveTimeoutTransportException.java 50.00% <0.00%> (-50.00%) ⬇️
.../admin/cluster/reroute/ClusterRerouteResponse.java 55.00% <0.00%> (-45.00%) ⬇️
.../org/opensearch/client/indices/AnalyzeRequest.java 31.00% <0.00%> (-42.00%) ⬇️
...java/org/opensearch/threadpool/ThreadPoolInfo.java 56.25% <0.00%> (-37.50%) ⬇️
...ndex/seqno/RetentionLeaseBackgroundSyncAction.java 37.50% <0.00%> (-37.50%) ⬇️
... and 474 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@anshu1106 anshu1106 force-pushed the wrr-state-across-requests branch from a5b2125 to 69d20ba Compare January 25, 2023 09:25
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testDeleteOperations
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes issue due to which state is not maintained across weighted shard routing search requests .

Can you provide more detail about what is changing here (or link an issue that explains in detail what the problem is)? "state is not maintained" is the cause, but what is the actual issue? i.e. what is the behavior change that a user will as a result of this fix?

CHANGELOG.md Outdated
@@ -69,6 +69,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Fix 'org.apache.hc.core5.http.ParseException: Invalid protocol version' under JDK 16+ ([#4827](https://github.com/opensearch-project/OpenSearch/pull/4827))
- Fixed compression support for h2c protocol ([#4944](https://github.com/opensearch-project/OpenSearch/pull/4944))
- Support OpenSSL Provider with default Netty allocator ([#5460](https://github.com/opensearch-project/OpenSearch/pull/5460))
- Fix weighted shard routing state across search requests([#6004](https://github.com/opensearch-project/OpenSearch/pull/6004))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to backport this to 2.x? If so, you'll need to put this in the [Unreleased 2.x] section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, moved the entry to 2.x section. Thanks for pointing this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes issue due to which state is not maintained across weighted shard routing search requests .

Can you provide more detail about what is changing here (or link an issue that explains in detail what the problem is)? "state is not maintained" is the cause, but what is the actual issue? i.e. what is the behavior change that a user will as a result of this fix?

Created an issue with the details #6056

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG entry should move to 2.x section

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@anshu1106 anshu1106 requested a review from andrross January 28, 2023 08:10
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testIndexReopenClose
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testDeleteOperations
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermark

Comment on lines 79 to 80
final ShardShuffler shuffler;
final ShardShuffler shufflerForWeightedRouting;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put up a comment explaining the same.

* Test to validate that shard routing state is maintained across requests, requests are assigned to nodes
* according to assigned routing weights
*/
public void testWeightedRoutingShardStateWithDifferentWeights() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering how did we miss this regression since we had ITs covering same . If not , lets please add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we have added a test now

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar dismissed andrross’s stale review January 29, 2023 16:59

Comments addressed

@Bukhtawar Bukhtawar merged commit cd860ec into opensearch-project:main Jan 29, 2023
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Jan 29, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 29, 2023
* Fix maintaining state across search requests with weighted shard routing

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
(cherry picked from commit cd860ec)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross added a commit that referenced this pull request Feb 3, 2023
* Fix maintaining state across search requests with weighted shard routing


(cherry picked from commit cd860ec)

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants