-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Introduce primaryTerm and inSynAllocationIds in IndexRoutingTable #20194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Introduce primaryTerm and inSynAllocationIds in IndexRoutingTable #20194
Conversation
WalkthroughIndexRoutingTable gained public primary term and in‑sync allocation ID fields with accessors, builder support, validation, and conditional serialization (version guarded). RoutingTableIncrementalDiff was updated to pass new constructor parameters (placeholder TODOs added). Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (1)
970-988:build()should infernumberOfShardsfromshards.size()if not explicitly set.The builder's
numberOfShardsmust matchshards.size()for the validation and loop logic to work correctly. However, several code paths (e.g.,readFrom()) don't callsetNumberOfShards(). Consider inferring it from the actual shards map:public IndexRoutingTable build() { + // Infer numberOfShards from shards if not explicitly set + if (numberOfShards <= 0) { + numberOfShards = shards.size(); + } + if (primaryTerms == null) { initializePrimaryTerms(); } else if (primaryTerms.length != numberOfShards) {This ensures consistency between the shards map and the new fields.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java(12 hunks)server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, macos-15)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, macos-15)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: Validate
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: verify-changelog
- GitHub Check: Analyze (java)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: gradle-check
🔇 Additional comments (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (2)
161-183: Validation logic looks correct and appropriately lenient.The validation only runs when fields are populated, which aligns with the PR objective of incremental adoption. The size/length checks against
indexMetadata.getNumberOfShards()are appropriate.
995-1004: Pretty print additions look good with appropriate null checks.
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java
Show resolved
Hide resolved
|
❌ Gradle check result for 61b2c5c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
||
| private final Map<Integer, Set<String>> inSyncAllocationIds; | ||
|
|
||
| IndexRoutingTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is a Public API, we need to maintain the constructor compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have reverted the original constructor and added a new constructor for additional fields.
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
| * Note: since we increment the term every time a shard is assigned, the term for any operational shard (i.e., a shard | ||
| * that can be indexed into) is larger than 0. See {@link IndexMetadataUpdater#applyChanges}. | ||
| **/ | ||
| public long getPrimaryTerm(int shardId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maintain parity with the existing getter in IndexMetadata?
public long primaryTerm(int shardId) {
return this.primaryTerms[shardId];
}
Also, since we are allowing null values in validate, can we have null checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a specific reason to maintain this parity. Getters of other IndexRoutingTable class variables are prefixed with get, hence kept the getter naming convention within the class.
Also, since we are allowing null values in validate, can we have null checks?
In the core the values will be null only for Remote downloads which is not addressed in scope of this CR. Added check for NPE.
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
|
|
||
| public void writeVerifiableTo(BufferedChecksumStreamOutput out) throws IOException { | ||
| index.writeTo(out); | ||
| // TODO: checksum calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, why do we need checksum calculation while writing here, isn't that auto calculated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writeVerifiableTo is used by remote checksum calculation and data order needs to be maintained for checksum to match. Since HashMap and sets don't maintain any specific order, a custom logic is required to ensure the order is maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that is the case. The underlying method is already sorting map keys for deterministic order and the existing classes too do not have any custom logic, please check if we are missing anything.
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
| } | ||
|
|
||
| public Builder setNumberOfShards(int numberOfShards) { | ||
| this.numberOfShards = numberOfShards; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even require this member variable? Its easy to miss out on setting this in ser/de. Besides, this anyway doesn't get set from IndexMetadata, so it seems like we will end up asserting the same value which we set using array's length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be set from IndexMetadata. Variable numberOfShards is required to configure the primaryTerms and in-sync allocation Ids size. I have handled the ser/de bug.
Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
61b2c5c to
b5fc89a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (1)
489-510: Fix serialization of V_3_4_0+ fields to prevent stream corruption and NullPointerExceptionTwo related issues with the new fields' serialization:
writeTovsreadFromasymmetry:writeToconditionally skips the new fields whenprimaryTerms != null && inSyncAllocationIds != null(line 524), butreadFromalways reads them unconditionally for V_3_4_0+ (lines 498-506). If an instance with null fields (created via the 2-arg constructor) is serialized, the reader still expects the fields, corrupting the stream layout.NullPointerException in
writeVerifiableTo: Line 539 unconditionally callsout.writeVLongArray(primaryTerms)without null checks. The underlyingStreamOutput.writeVLongArrayiterates directly over the array, so nullprimaryTerms(present in instances created via the 2-arg constructor orRoutingTableIncrementalDiff) will throwNullPointerException.Enforce non-null invariants by initializing both fields to empty structures (empty array and empty map) and always serializing them for V_3_4_0+. This ensures
readFromandwriteTo/writeVerifiableTostay in lockstep and eliminates the NPE risk.
🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (2)
105-133: Consider defensive copies forprimaryTermsandinSyncAllocationIdsin constructorThe new constructor stores the provided
primaryTermsarray and the values ininSyncAllocationIdsdirectly (only the map itself is wrapped), so callers can mutate the internal state after construction.To keep
IndexRoutingTableeffectively immutable, consider cloning the array and deep-copying/wrapping the sets, similar to whatBuilder.build()does:- this.primaryTerms = primaryTerms; - this.inSyncAllocationIds = inSyncAllocationIds != null ? Collections.unmodifiableMap(inSyncAllocationIds) : Collections.emptyMap(); + this.primaryTerms = primaryTerms != null ? primaryTerms.clone() : null; + if (inSyncAllocationIds != null) { + Map<Integer, Set<String>> tmp = new HashMap<>(inSyncAllocationIds.size()); + for (Map.Entry<Integer, Set<String>> e : inSyncAllocationIds.entrySet()) { + tmp.put(e.getKey(), Collections.unmodifiableSet(new HashSet<>(e.getValue()))); + } + this.inSyncAllocationIds = Collections.unmodifiableMap(tmp); + } else { + this.inSyncAllocationIds = Collections.emptyMap(); + }
1030-1042: Pretty print of new metadata is useful; consider stable ordering for in-sync IDsIncluding
primaryTermsandinSyncAllocationIdsinprettyPrint()is very helpful for debugging. If you care about stable human‑readable output, you may want to iterateinSyncAllocationIdsby sorted shard ID (e.g., vianew TreeSet<>(inSyncAllocationIds.keySet())) rather than relying on map iteration order, but that’s purely cosmetic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java(12 hunks)server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
🔇 Additional comments (3)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (3)
35-72: Imports for versioning/serialization look consistentThe added imports (
Version,DiffableUtils,IndexMetadataUpdater,SequenceNumbers,Arrays,Objects,TreeSet) are all used and appropriate for the new routing metadata and serialization logic. No issues here.
166-188: Lenient validation for new fields matches stated designThe additional checks on
primaryTermslength andinSyncAllocationIdssize, gated on “populated” (non‑null/non‑empty), align with the intent that these fields are optional/unpopulated in this PR and only validated when present. No functional issues noticed here.
451-487: Equality, hashCode, and toString integration for new fields looks goodIncluding
primaryTermsandinSyncAllocationIdsinequals,hashCode, andtoStringviaArrays.*andObjects.*is null‑safe and keeps identity semantics aligned with the new metadata. This is consistent and looks correct.
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Show resolved
Hide resolved
| private final Index index; | ||
| private final Map<Integer, IndexShardRoutingTable> shards = new HashMap<>(); | ||
| private long[] primaryTerms = null; | ||
| private final Map<Integer, Set<String>> inSyncAllocationIds; | ||
| private int numberOfShards; | ||
|
|
||
| public Builder(Index index) { | ||
| this.index = index; | ||
| this.inSyncAllocationIds = new HashMap<>(); | ||
| } | ||
|
|
||
| public Builder setPrimaryTerms(final long[] primaryTerms) { | ||
| this.primaryTerms = primaryTerms.clone(); | ||
| return this; | ||
| } | ||
|
|
||
| public Builder setPrimaryTerm(int shardId, long primaryTerm) { | ||
| if (primaryTerms == null) { | ||
| initializePrimaryTerms(); | ||
| } | ||
| this.primaryTerms[shardId] = primaryTerm; | ||
| return this; | ||
| } | ||
|
|
||
| public long getPrimaryTerm(int shardId) { | ||
| if (primaryTerms == null) { | ||
| initializePrimaryTerms(); | ||
| } | ||
| return this.primaryTerms[shardId]; | ||
| } | ||
|
|
||
| private void initializePrimaryTerms() { | ||
| assert primaryTerms == null; | ||
| if (numberOfShards < 0) { | ||
| throw new IllegalStateException("you must set the number of shards before setting/reading primary terms"); | ||
| } | ||
| primaryTerms = new long[numberOfShards]; | ||
| Arrays.fill(primaryTerms, SequenceNumbers.UNASSIGNED_PRIMARY_TERM); | ||
| } | ||
|
|
||
| public Builder setInSyncAllocationIds(int shardId, Set<String> allocationIds) { | ||
| inSyncAllocationIds.put(shardId, new HashSet<>(allocationIds)); | ||
| return this; | ||
| } | ||
|
|
||
| public Builder setInSyncAllocationIds(Map<Integer, Set<String>> inSyncAllocationIds) { | ||
| this.inSyncAllocationIds.putAll(inSyncAllocationIds); | ||
| return this; | ||
| } | ||
|
|
||
| public Set<String> getInSyncAllocationIds(int shardId) { | ||
| return inSyncAllocationIds.get(shardId); | ||
| } | ||
|
|
||
| public Builder setNumberOfShards(int numberOfShards) { | ||
| this.numberOfShards = numberOfShards; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Builder numberOfShards / initializePrimaryTerms contract is easy to misuse
The builder changes are generally good, but there’s a subtle footgun:
numberOfShardsdefaults to0.initializePrimaryTerms()only throws whennumberOfShards < 0, then allocatesnew long[numberOfShards].setPrimaryTerm()andgetPrimaryTerm()callinitializePrimaryTerms()whenprimaryTerms == null.
If a caller forgets to call setNumberOfShards() before using these methods (or before build() when primaryTerms is still null), initializePrimaryTerms() will happily create a long[0], and subsequent access like this.primaryTerms[shardId] will fail with ArrayIndexOutOfBoundsException.
Given this is a public builder API, I’d recommend tightening the contract:
- Either infer
numberOfShardsfromshards.size()when it hasn’t been set, or - Treat “unset” explicitly and fail fast.
For example:
- private int numberOfShards;
+ private int numberOfShards; // 0 means "not explicitly set"
@@
private void initializePrimaryTerms() {
assert primaryTerms == null;
- if (numberOfShards < 0) {
- throw new IllegalStateException("you must set the number of shards before setting/reading primary terms");
- }
- primaryTerms = new long[numberOfShards];
+ int shardsCount = numberOfShards > 0 ? numberOfShards : shards.size();
+ if (shardsCount <= 0) {
+ throw new IllegalStateException("you must set the number of shards before setting/reading primary terms");
+ }
+ numberOfShards = shardsCount;
+ primaryTerms = new long[numberOfShards];
Arrays.fill(primaryTerms, SequenceNumbers.UNASSIGNED_PRIMARY_TERM);
}
@@
public IndexRoutingTable build() {
- if (primaryTerms == null) {
- initializePrimaryTerms();
- } else if (primaryTerms.length != numberOfShards) {
+ if (primaryTerms == null) {
+ initializePrimaryTerms();
+ } else if (numberOfShards > 0 && primaryTerms.length != numberOfShards) {
throw new IllegalStateException(
"primaryTerms length is [" + primaryTerms.length + "] but should be equal to number of shards [" + numberOfShards + "]"
);
}This keeps existing builder usages working (deriving from shards.size() when explicit information is missing) while still enforcing a consistent array length once numberOfShards is known.
Also applies to: 591-598, 600-617, 1009-1026
🤖 Prompt for AI Agents
In server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
around lines 560-617, the Builder currently defaults numberOfShards to 0 so
initializePrimaryTerms() can silently create a zero-length primaryTerms array
causing ArrayIndexOutOfBounds later; change initializePrimaryTerms() to treat
"unset" explicitly by inferring numberOfShards from shards.size() when
numberOfShards is not set (or <= 0), and if that still yields 0 throw an
IllegalStateException requiring numberOfShards to be set; also validate
setNumberOfShards to require a positive value (or refuse changes after
primaryTerms is initialized) so primaryTerms length is always consistent with
the expected shard count.
|
❌ Gradle check result for b5fc89a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
||
| private void initializePrimaryTerms() { | ||
| assert primaryTerms == null; | ||
| if (numberOfShards < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the check be <= 0? Is there any case when numberOfShards can be 0?
Description
First PR for #20062: Introduce primaryTerms and inSyncAllocationIds fields in IndexRoutingTable to make IndexRoutingTable the source of truth for these fields instead of IndexMetadata.
Changes
Not Included
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] API changes companion pull request created, if applicable.[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.