Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Merge back to main tracking issue. #2355

Closed
9 of 11 tasks
Tracked by #2194
mch2 opened this issue Mar 4, 2022 · 1 comment
Closed
9 of 11 tasks
Tracked by #2194

[Segment Replication] Merge back to main tracking issue. #2355

mch2 opened this issue Mar 4, 2022 · 1 comment

Comments

@mch2
Copy link
Member

mch2 commented Mar 4, 2022

There are significant gaps in unit & integration testing of the feature because this was not prioritized with the poc. This issue will track issues to fill these gaps.

This is a list of how I think we could cut our feature branch to start merging back into main. Each one of these should have unit coverage.

Misc changes:

Self review. These are some Ideas after looking at our diff here - https://github.com/opensearch-project/OpenSearch/compare/feature/segment-replication that I think might be worth doing as we merge back into main:

  • IndicesClusterStateService - Refactor ShardRoutingReplicationListener and ShardRoutingRecoveryListener to a single listener class, these are identical.
  • Store & MultiFileWriter - conditional fsyncs. Do nothing for now on the merge. We want this as a separate optional sergep setting, and they should be enabled with segrep on, so logic is different than what’s in feature branch right now.
@mch2 mch2 added bug Something isn't working untriaged labels Mar 4, 2022
@mch2 mch2 removed bug Something isn't working untriaged labels Mar 7, 2022
@mch2 mch2 changed the title [Segment Replication] Fix Test coverage tracking issue. [Segment Replication] Merge back to main tracking issue. Apr 14, 2022
kartg added a commit to kartg/OpenSearch that referenced this issue Apr 18, 2022
This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in opensearch-project#2355
Segment replication design proposal - opensearch-project#2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
dblock pushed a commit that referenced this issue Apr 20, 2022
* Refactoring GatedAutoCloseable to AutoCloseableRefCounted

This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Minor refactoring in RecoveryState

This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Update RecoveryTargetTests to test Timer subclasses deterministically

This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move the RecoveryState.Timer class to a top-level class

This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Further update of timer tests in RecoveryTargetTests

Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Rename to ReplicationTimer

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Remove RecoveryTargetTests assert on a running timer

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
opensearch-trigger-bot bot pushed a commit that referenced this issue Apr 20, 2022
* Refactoring GatedAutoCloseable to AutoCloseableRefCounted

This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Minor refactoring in RecoveryState

This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Update RecoveryTargetTests to test Timer subclasses deterministically

This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move the RecoveryState.Timer class to a top-level class

This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Further update of timer tests in RecoveryTargetTests

Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Rename to ReplicationTimer

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Remove RecoveryTargetTests assert on a running timer

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
(cherry picked from commit c7c410a)
kartg added a commit that referenced this issue Apr 21, 2022
…#3014)

* Refactoring GatedAutoCloseable to AutoCloseableRefCounted

This is a part of the process of merging our feature branch - feature/segment-replication - back into main by re-PRing our changes from the feature branch.
GatedAutoCloseable currently wraps a subclass of RefCounted. Segment replication adds another subclass, but this also wraps RefCounted. Both subclasses have the same shutdown hook - decRef. This change makes the superclass less generic to increase code convergence.

The breakdown of the plan to merge segment-replication to main is detailed in #2355
Segment replication design proposal - #2229

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Minor refactoring in RecoveryState

This change makes two minor updates to RecoveryState -
1. The readRecoveryState API is removed because it can be replaced by an invocation of the constructor
2. The class members of the Timer inner class are changed to private, and accesses are only through the public APIs

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Update RecoveryTargetTests to test Timer subclasses deterministically

This change removes the use of RandomBoolean in testing the Timer classes and creates a dedicated unit test for each. The common test logic is shared via a private method.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Move the RecoveryState.Timer class to a top-level class

This will eventually be reused across both replication use-cases - peer recovery and segment replication.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Further update of timer tests in RecoveryTargetTests

Removes a non-deterministic code path around stopping the timer, and avoids assertThat (deprecated)

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Rename to ReplicationTimer

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

* Remove RecoveryTargetTests assert on a running timer

Trying to serialize and deserialize a running Timer instance, and then checking for equality leads to flaky test failures when the ser/deser takes time.

Signed-off-by: Kartik Ganesh <gkart@amazon.com>
(cherry picked from commit c7c410a)

Co-authored-by: Kartik Ganesh <gkart@amazon.com>
@mch2
Copy link
Member Author

mch2 commented Jul 20, 2022

Closing bc merge is complete, recovery path updates is not a merge. Tracked in meta issue.

@mch2 mch2 closed this as completed Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant