Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unclosed store references with SearchOnlyReplicaIT #16106

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Sep 27, 2024

Description

This PR fixes a bug with node-node pull based replication where store refs are left open. If the replica does not know the DiscoveryNode of its primary we would fail late only after replication is started and a ref is not cleaned up. Push based replication already handled this case by catching any error and closing the SegmentReplicationTarget, which holds the ref. This update ensures the validation is done before constructing our PrimaryShardReplicationSource, before any target object is created in both cases push and pull.

Related Issues

Resolves #15812

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… is unknown.

This PR fixes a bug with node-node pull based replication where if the replica does not know
the DiscoveryNode of its primary we would fail after constructing a
SegmentReplicationTarget that holds a store reference.  Only after replication
is started would a failure occur because the source node is null, and the target would not get cleaned up.
Push based replication already handled this case by catching any error and closing the target.
This update ensures the validation is done before constructing our PrimaryShardReplicationSource, before
any target object is created in both cases push and pull.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Sep 27, 2024
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Sep 27, 2024
@mch2 mch2 added skip-changelog and removed >test-failure Test failure from CI, local build, etc. flaky-test Random test failure that succeeds on second run autocut labels Sep 27, 2024
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Sep 27, 2024
Copy link
Contributor

✅ Gradle check result for 9510a7b: SUCCESS

@mch2 mch2 changed the title Fix unclosed store references with node-node segrep when primary node is unknown Fix unclosed store references with SearchOnlyReplicaIT Sep 27, 2024
Copy link

codecov bot commented Sep 27, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.96%. Comparing base (7caca26) to head (9510a7b).
Report is 121 commits behind head on main.

Files with missing lines Patch % Lines
...ces/replication/PrimaryShardReplicationSource.java 0.00% 0 Missing and 2 partials ⚠️
...s/replication/SegmentReplicationSourceFactory.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16106      +/-   ##
============================================
- Coverage     72.04%   71.96%   -0.09%     
+ Complexity    64441    64434       -7     
============================================
  Files          5281     5281              
  Lines        301088   301094       +6     
  Branches      43500    43503       +3     
============================================
- Hits         216918   216673     -245     
- Misses        66438    66638     +200     
- Partials      17732    17783      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mch2
Copy link
Member Author

mch2 commented Oct 22, 2024

codecov complains bc of the asserts i've written aren't hit in unit tests, I think we can safely ignore this.

@mch2 mch2 merged commit 267c68e into opensearch-project:main Oct 22, 2024
68 of 71 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 22, 2024
… is unknown. (#16106)

This PR fixes a bug with node-node pull based replication where if the replica does not know
the DiscoveryNode of its primary we would fail after constructing a
SegmentReplicationTarget that holds a store reference.  Only after replication
is started would a failure occur because the source node is null, and the target would not get cleaned up.
Push based replication already handled this case by catching any error and closing the target.
This update ensures the validation is done before constructing our PrimaryShardReplicationSource, before
any target object is created in both cases push and pull.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
(cherry picked from commit 267c68e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@mch2 mch2 deleted the 15812 branch October 22, 2024 17:55
dbwiddis pushed a commit that referenced this pull request Oct 22, 2024
… is unknown. (#16106) (#16435)

This PR fixes a bug with node-node pull based replication where if the replica does not know
the DiscoveryNode of its primary we would fail after constructing a
SegmentReplicationTarget that holds a store reference.  Only after replication
is started would a failure occur because the source node is null, and the target would not get cleaned up.
Push based replication already handled this case by catching any error and closing the target.
This update ensures the validation is done before constructing our PrimaryShardReplicationSource, before
any target object is created in both cases push and pull.


(cherry picked from commit 267c68e)

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocut backport 2.x Backport to 2.x branch flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for SearchOnlyReplicaIT
2 participants