Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Add tests for Segment replication feature #921

Open
dreamer-89 opened this issue May 26, 2023 · 3 comments
Open

[Segment Replication] Add tests for Segment replication feature #921

dreamer-89 opened this issue May 26, 2023 · 3 comments
Assignees

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented May 26, 2023

Recently, knn plugin integration was found broken for 2.7.0 version (issue on core opensearch-project/OpenSearch#7781). This is due to bugs on core where

  1. codec names were used before allowing replica to copy files from primary.
  2. using default codec name on replica which is current version of Lucene codec e.g. Lucene95 today

Due to 2) above, the replication events for kNN indices were blocked from primary because it's lucene codec name (KNNXXXCodec) differs from replica (LuceneXXX). The end result was replica shard remained unassigned forever.

As part of this issue, we need to add tests so that these issues be caught before release.

  1. Integ Tests
  2. Bwc Tests

Background on core issue

Segment replicaion event fails for plugins using custom codecs (e.g. kNN). The failure prevents replica shard from allocation. The end result is replica shard remains unassigned and will remain forever.

During peer recovery, for segment replication enabled indices a force segment sync is performed to keep the shard upto date from primary. Recently, we added a fix where to prevent segment replication events b/w primary and replica when they are using a different codec implementations. This is problematic as we fetched replica shard default codec rather than the one on the engine config.

@dreamer-89 dreamer-89 added bug Something isn't working untriaged and removed bug Something isn't working labels May 26, 2023
@navneet1v
Copy link
Collaborator

@dreamer-89 can you provide some details like what is the expected solution?

@navneet1v navneet1v added bug Something isn't working and removed untriaged labels May 30, 2023
@dreamer-89 dreamer-89 changed the title [BUG] [Segment Replication] Segment replication is broken for plugins using custom codecs [Segment Replication] Add tests for Segment replication feature May 30, 2023
@dreamer-89 dreamer-89 removed the bug Something isn't working label May 30, 2023
@dreamer-89
Copy link
Member Author

@dreamer-89 can you provide some details like what is the expected solution?

Thanks @navneet1v for checking on this. The issue description wasn't correct (actually copied from core issue). I updated the issue to make to add tests which is actionable on this repo.

@dreamer-89
Copy link
Member Author

Opened a PR to add basic integ test: #927

I see by default, tests use one node cluster which is not useful for segment replication. Thus, need some opinion from repo owners around structuring this into a separate test class which tied to a different gradle task or identify if newly added test can be run in multi-node setup. Tagging @navneet1v @naveentatikonda for visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants