Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Avoid indexing documents as stale in non primary mode #8578

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Jul 10, 2023

Description

  1. Added few more tests for failover and relocation for remote store
  2. Added Remote Translog UT framework . SegmentReplicationWithRemoteIndexShardTests now uses remote translog
  3. Avoid indexing already indexed documents as stale in non primary mode for SegRep based indices .
  4. Added UT for checking no duplicate documents having same sequence number

Testing

Ran RemoteIndexPrimaryRelocationIT 10 times on local to ensure it is not flaky . Running it 100 times more

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • [] Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gbbafna gbbafna marked this pull request as ready for review July 10, 2023 13:46
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 384374d
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: a9354f1
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

1 similar comment
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: a9354f1
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@gbbafna gbbafna marked this pull request as draft July 10, 2023 15:36
@gbbafna
Copy link
Collaborator Author

gbbafna commented Jul 10, 2023

RemoteIndexPrimaryRelocationIT is flaky .

  2> REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT" -Dtests.seed=E83A62EA35325978 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-IN -Dtests.timezone=Asia/Kolkata -Druntime.java=20
  2> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=79, name=opensearch[node_t3][generic][T#3], state=RUNNABLE, group=TGRP-RemoteIndexPrimaryRelocationIT]

        Caused by:
        java.lang.AssertionError: local checkpoint tracker is not updated seq_no=2652 id=id
            at __randomizedtesting.SeedInfo.seed([E83A62EA35325978]:0)
            at org.opensearch.index.engine.InternalEngine.compareOpToLuceneDocBasedOnSeqNo(InternalEngine.java:727)
            at org.opensearch.index.engine.InternalEngine.planIndexingAsNonPrimary(InternalEngine.java:1008)
            at org.opensearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1023)
            at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:883)
            at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1073)
            at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1018)
            at org.opensearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:2151)
            at org.opensearch.index.shard.IndexShard.runTranslogRecovery(IndexShard.java:2206)
            at org.opensearch.index.shard.IndexShard.lambda$resetEngineToGlobalCheckpoint$45(IndexShard.java:4546)
            at org.opensearch.index.translog.InternalTranslogManager.recoverFromTranslogInternal(InternalTranslogManager.java:144)
            at org.opensearch.index.translog.InternalTranslogManager.recoverFromTranslog(InternalTranslogManager.java:126)
            at org.opensearch.index.shard.IndexShard.resetEngineToGlobalCheckpoint(IndexShard.java:4562)
            at org.opensearch.index.shard.IndexShard.lambda$resetToWriteableEngine$12(IndexShard.java:1852)
            at org.opensearch.index.shard.IndexShardOperationPermits.blockOperations(IndexShardOperationPermits.java:122)
            at org.opensearch.index.shard.IndexShard.resetToWriteableEngine(IndexShard.java:1852)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$ForceSyncTransportRequestHandler$1.onReplicationDone(SegmentReplicationTargetService.java:510)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$SegmentReplicationListener.onDone(SegmentReplicationTargetService.java:397)
            at org.opensearch.indices.replication.common.ReplicationTarget.markAsDone(ReplicationTarget.java:146)
            at org.opensearch.indices.replication.common.ReplicationCollection.markAsDone(ReplicationCollection.java:201)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onResponse(SegmentReplicationTargetService.java:438)
            at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onResponse(SegmentReplicationTargetService.java:435)
            at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$4(SegmentReplicationTarget.java:171)
            at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
            at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341)
            at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120)
            at org.opensearch.common.util.concurrent.ListenableFuture.addListener(Li
Tests with failures:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=cluster.allocation_explain/10_basic/cluster shard allocation explanation test with empty request}

@codecov
Copy link

codecov bot commented Aug 1, 2023

Codecov Report

Merging #8578 (a4d1623) into main (d916f9c) will decrease coverage by 0.05%.
Report is 2 commits behind head on main.
The diff coverage is 37.83%.

@@             Coverage Diff              @@
##               main    #8578      +/-   ##
============================================
- Coverage     70.97%   70.93%   -0.05%     
- Complexity    57195    57243      +48     
============================================
  Files          4765     4765              
  Lines        270334   270357      +23     
  Branches      39538    39543       +5     
============================================
- Hits         191879   191777     -102     
- Misses        62282    62427     +145     
+ Partials      16173    16153      -20     
Files Changed Coverage Δ
...main/java/org/opensearch/common/network/Cidrs.java 77.77% <ø> (ø)
...a/org/opensearch/common/network/InetAddresses.java 94.83% <ø> (ø)
.../org/opensearch/common/network/NetworkAddress.java 100.00% <ø> (ø)
...earch/common/transport/NetworkExceptionHelper.java 37.50% <ø> (ø)
...va/org/opensearch/common/transport/PortsRange.java 87.09% <ø> (ø)
...h/core/common/transport/BoundTransportAddress.java 91.17% <ø> (ø)
...search/core/common/transport/TransportAddress.java 79.48% <ø> (ø)
...rg/opensearch/core/transport/TransportMessage.java 100.00% <ø> (ø)
...g/opensearch/core/transport/TransportResponse.java 85.71% <ø> (ø)
...iscovery/azure/classic/AzureSeedHostsProvider.java 0.00% <ø> (ø)
... and 98 more

... and 448 files with indirect coverage changes

@sachinpkale
Copy link
Member

Changes look good. Please resolve the conflicts.

replay

Enabling Remote Translog for Unit Tests

Add more IT for Relocation and Failover for Remote Store

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:


> Task :checkCompatibility
Checking compatibility for: https://github.com/opensearch-project/reporting.git with ref: main
Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git]
Compatible components: [https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

BUILD SUCCESSFUL in 27m 44s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

@sachinpkale
Copy link
Member

codecov/patch is failing even after unit tests are added. Bypassing the check.

@sachinpkale sachinpkale merged commit e55dade into opensearch-project:main Aug 3, 2023
@sachinpkale sachinpkale added the backport 2.x Backport to 2.x branch label Aug 3, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8578-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e55dadef57809a7fc04dac5ff4d43873beeefadc
# Push it to GitHub
git push --set-upstream origin backport/backport-8578-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8578-to-2.x.

@gbbafna gbbafna deleted the failover-test branch August 3, 2023 11:37
VachaShah pushed a commit to VachaShah/OpenSearch that referenced this pull request Aug 3, 2023
gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Aug 4, 2023
gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Aug 4, 2023
gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Aug 7, 2023
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…replay (opensearch-project#8578)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…replay (opensearch-project#8578)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…replay (opensearch-project#8578)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants