Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary refresh listeners from NRTReplicationReaderManager. #8859

Merged
merged 2 commits into from
Jul 26, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Jul 25, 2023

Description

This change removes RefreshListeners used by IndexShard & InternalEngine to provide waitFor functionality from NRT replicas. These listeners were previously registered onto NRT replicas only to be force released on the next scheduled refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh, resolving the deadlock in #8858.

Added test to ensure refreshListeners are not registered. This is also covered by existing tests testDropPrimaryDuringReplication and testPrimaryRelocation.

Related Issues

Resolves #8858
related #8859

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation

@codecov
Copy link

codecov bot commented Jul 25, 2023

Codecov Report

Merging #8859 (d5971fd) into main (5d78de6) will increase coverage by 0.06%.
The diff coverage is 74.19%.

@@             Coverage Diff              @@
##               main    #8859      +/-   ##
============================================
+ Coverage     70.98%   71.05%   +0.06%     
- Complexity    57166    57181      +15     
============================================
  Files          4757     4757              
  Lines        269823   269822       -1     
  Branches      39474    39477       +3     
============================================
+ Hits         191536   191709     +173     
+ Misses        62158    61964     -194     
- Partials      16129    16149      +20     
Files Changed Coverage Δ
.../opensearch/index/engine/NRTReplicationEngine.java 79.47% <0.00%> (+1.69%) ⬆️
...org/opensearch/index/seqno/ReplicationTracker.java 68.36% <74.07%> (-0.91%) ⬇️
...arch/index/engine/NRTReplicationReaderManager.java 96.96% <100.00%> (ø)
...in/java/org/opensearch/index/shard/IndexShard.java 69.57% <100.00%> (-0.17%) ⬇️
.../replication/checkpoint/ReplicationCheckpoint.java 87.71% <100.00%> (+2.00%) ⬆️

... and 445 files with indirect coverage changes

…tPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@mch2
Copy link
Member Author

mch2 commented Jul 25, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation

This test is failing with a timeout over 1m for relocation to execute. I've made a change to reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. With the IMMEDIATE refresh policy a blocking refresh is performed that acquires a primary operation permit and increases the time required for the primary to block operations for relocation. On my machine this change reduces the test time with max docs from 1m to 5-6s.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationIT.testScrollCreatedOnReplica
      1 org.opensearch.index.ShardIndexingPressureSettingsIT.testShardIndexingPressureEnforcedEnabledDisabledSetting

@mch2
Copy link
Member Author

mch2 commented Jul 25, 2023

#7556

@mch2 mch2 merged commit 4319f2b into opensearch-project:main Jul 26, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 26, 2023
#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit 4319f2b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mch2 pushed a commit that referenced this pull request Jul 26, 2023
#8859) (#8894)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.



* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.



---------


(cherry picked from commit 4319f2b)

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Jul 27, 2023
opensearch-project#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
opensearch-project#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
opensearch-project#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
opensearch-project#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
opensearch-project#8859)

* Remove unnecessary refresh listeners from NRTReplicationReaderManager.

This change removes RefreshListeners used by InternalEngine to provide waitFor functionality.
These listeners were previously registered onto NRT replicas only to be force released on the next refresh cycle without actually refreshing the reader.

This change also removes the unnecessary blocking refresh from NRTReaderManager because we no longer have conflicting refresh invocations from scheduledRefresh.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Reduce the amount of docs ingested with testPrimaryRelocation and testPrimaryRelocationWithSegRepFailure.

These tests were ingesting 100-1k docs and randomly selecting a refresh policy. Wtih the IMMEDIATE refresh policy a blocking refresh is performed that increase the time required for the primary to block operations for relocation.  On my machine this change reduces the test time with max docs from 1m to 5-6s.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Test SegmentReplicationRelocationIT.testPrimaryRelocation is flaky.
4 participants