Skip to content

Conversation

@SwethaGuptha
Copy link
Contributor

@SwethaGuptha SwethaGuptha commented Dec 9, 2025

Description

First PR for #20062: Introduce primaryTerms and inSyncAllocationIds fields in IndexRoutingTable to make IndexRoutingTable the source of truth for these fields instead of IndexMetadata.

Changes

  • Added primaryTerms and inSyncAllocationIds fields
  • Added getters, builder methods, serialization (version-gated with Version.V_3_4_0), validation, equals/hashCode/toString
  • Validation is lenient (only validates if fields are populated)

Not Included

  • Remote publication cluster diff serialization changes.
  • Fields remain unpopulated (null/empty by default). Future PR will populate these fields from IndexMetadata and migrate read locations to use IndexRoutingTable as the source of truth.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [Y] Functionality includes testing - Existing test suite.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Refactor
    • Cluster routing now tracks and exposes primary terms and in-sync allocation IDs to improve shard-state visibility and management.
    • Validation and serialization updated for compatibility with version 3.4.0 and later, and diagnostic output now includes the new routing metadata.

✏️ Tip: You can customize this high-level summary in your review settings.

@SwethaGuptha SwethaGuptha requested a review from a team as a code owner December 9, 2025 04:24
@SwethaGuptha SwethaGuptha changed the title Introduce primaryTerm and inSynAllocationIdsK in IndexRoutingTable Introduce primaryTerm and inSynAllocationIds in IndexRoutingTable Dec 9, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

IndexRoutingTable gained public primary term and in‑sync allocation ID fields with accessors, builder support, validation, and conditional serialization (version guarded). RoutingTableIncrementalDiff was updated to pass new constructor parameters (placeholder TODOs added).

Changes

Cohort / File(s) Summary
Primary term & in-sync fields
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Added primaryTerms (long[]) and inSyncAllocationIds (Map<Integer,Set); new getters (getPrimaryTerm(int), getPrimaryTerms(), getInSyncAllocationIds(int), getInSyncAllocationIds()); included fields in equals(), hashCode(), and toString(); prettyPrint extended.
Builder & validation
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
Builder extended with fields and methods to set/initialize primary terms and in-sync allocation IDs (setPrimaryTerms, setPrimaryTerm, initializePrimaryTerms, setInSyncAllocationIds, getters); build() validates lengths match shard count and fills missing in-sync sets with immutable empty sets.
Serialization & versioning
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
readFrom, writeTo, and writeVerifiableTo updated to conditionally serialize/deserialize the new fields for Version.V_3_4_0 and later; imports added for utilities used in serialization.
RoutingTable incremental diff
server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java
Updated IndexRoutingTableIncrementalDiff.apply() to pass two additional nullable parameters (null, null) into the IndexRoutingTable constructor; added TODO comments about remote diff handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

  • Verify all constructor call sites and any other callers handle the additional nullable parameters.
  • Inspect serialization guards for Version.V_3_4_0 to ensure backward/forward compatibility.
  • Review Builder initialization/validation paths for null handling and shard-count consistency.
  • Check equals/hashCode/toString changes for correctness and contract preservation.
  • Confirm immutability and defensive copying for returned in-sync allocation ID collections.

Poem

🐇 I nibble bits of routing lore,
Primary terms and IDs galore,
Builders bake them, guarded by time,
Shards align in ordered rhyme —
A tiny hop for code, hooray! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.34% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: introducing primaryTerms and inSyncAllocationIds in IndexRoutingTable, which matches the actual file modifications and PR objectives.
Description check ✅ Passed The description adequately covers the change scope, lists the modifications, references the related issue, and indicates what is not included in this PR. However, the Related Issues section is incomplete with a placeholder.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (1)

970-988: build() should infer numberOfShards from shards.size() if not explicitly set.

The builder's numberOfShards must match shards.size() for the validation and loop logic to work correctly. However, several code paths (e.g., readFrom()) don't call setNumberOfShards(). Consider inferring it from the actual shards map:

         public IndexRoutingTable build() {
+            // Infer numberOfShards from shards if not explicitly set
+            if (numberOfShards <= 0) {
+                numberOfShards = shards.size();
+            }
+
             if (primaryTerms == null) {
                 initializePrimaryTerms();
             } else if (primaryTerms.length != numberOfShards) {

This ensures consistency between the shards map and the new fields.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9f8d381 and 61b2c5c.

📒 Files selected for processing (2)
  • server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (12 hunks)
  • server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, macos-15)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, macos-15)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: Validate
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: verify-changelog
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: gradle-check
🔇 Additional comments (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (2)

161-183: Validation logic looks correct and appropriately lenient.

The validation only runs when fields are populated, which aligns with the PR objective of incremental adoption. The size/length checks against indexMetadata.getNumberOfShards() are appropriate.


995-1004: Pretty print additions look good with appropriate null checks.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

❌ Gradle check result for 61b2c5c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


private final Map<Integer, Set<String>> inSyncAllocationIds;

IndexRoutingTable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a Public API, we need to maintain the constructor compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have reverted the original constructor and added a new constructor for additional fields.

* Note: since we increment the term every time a shard is assigned, the term for any operational shard (i.e., a shard
* that can be indexed into) is larger than 0. See {@link IndexMetadataUpdater#applyChanges}.
**/
public long getPrimaryTerm(int shardId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain parity with the existing getter in IndexMetadata?

    public long primaryTerm(int shardId) {
        return this.primaryTerms[shardId];
    }

Also, since we are allowing null values in validate, can we have null checks?

Copy link
Contributor Author

@SwethaGuptha SwethaGuptha Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a specific reason to maintain this parity. Getters of other IndexRoutingTable class variables are prefixed with get, hence kept the getter naming convention within the class.

Also, since we are allowing null values in validate, can we have null checks?

In the core the values will be null only for Remote downloads which is not addressed in scope of this CR. Added check for NPE.


public void writeVerifiableTo(BufferedChecksumStreamOutput out) throws IOException {
index.writeTo(out);
// TODO: checksum calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, why do we need checksum calculation while writing here, isn't that auto calculated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writeVerifiableTo is used by remote checksum calculation and data order needs to be maintained for checksum to match. Since HashMap and sets don't maintain any specific order, a custom logic is required to ensure the order is maintained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the case. The underlying method is already sorting map keys for deterministic order and the existing classes too do not have any custom logic, please check if we are missing anything.

}

public Builder setNumberOfShards(int numberOfShards) {
this.numberOfShards = numberOfShards;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even require this member variable? Its easy to miss out on setting this in ser/de. Besides, this anyway doesn't get set from IndexMetadata, so it seems like we will end up asserting the same value which we set using array's length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be set from IndexMetadata. Variable numberOfShards is required to configure the primaryTerms and in-sync allocation Ids size. I have handled the ser/de bug.

Signed-off-by: Swetha Guptha <gupthasg@amazon.com>
@SwethaGuptha SwethaGuptha force-pushed the routing-allocation-metadata branch from 61b2c5c to b5fc89a Compare December 15, 2025 08:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (1)

489-510: Fix serialization of V_3_4_0+ fields to prevent stream corruption and NullPointerException

Two related issues with the new fields' serialization:

  1. writeTo vs readFrom asymmetry: writeTo conditionally skips the new fields when primaryTerms != null && inSyncAllocationIds != null (line 524), but readFrom always reads them unconditionally for V_3_4_0+ (lines 498-506). If an instance with null fields (created via the 2-arg constructor) is serialized, the reader still expects the fields, corrupting the stream layout.

  2. NullPointerException in writeVerifiableTo: Line 539 unconditionally calls out.writeVLongArray(primaryTerms) without null checks. The underlying StreamOutput.writeVLongArray iterates directly over the array, so null primaryTerms (present in instances created via the 2-arg constructor or RoutingTableIncrementalDiff) will throw NullPointerException.

Enforce non-null invariants by initializing both fields to empty structures (empty array and empty map) and always serializing them for V_3_4_0+. This ensures readFrom and writeTo/writeVerifiableTo stay in lockstep and eliminates the NPE risk.

🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (2)

105-133: Consider defensive copies for primaryTerms and inSyncAllocationIds in constructor

The new constructor stores the provided primaryTerms array and the values in inSyncAllocationIds directly (only the map itself is wrapped), so callers can mutate the internal state after construction.

To keep IndexRoutingTable effectively immutable, consider cloning the array and deep-copying/wrapping the sets, similar to what Builder.build() does:

-        this.primaryTerms = primaryTerms;
-        this.inSyncAllocationIds = inSyncAllocationIds != null ? Collections.unmodifiableMap(inSyncAllocationIds) : Collections.emptyMap();
+        this.primaryTerms = primaryTerms != null ? primaryTerms.clone() : null;
+        if (inSyncAllocationIds != null) {
+            Map<Integer, Set<String>> tmp = new HashMap<>(inSyncAllocationIds.size());
+            for (Map.Entry<Integer, Set<String>> e : inSyncAllocationIds.entrySet()) {
+                tmp.put(e.getKey(), Collections.unmodifiableSet(new HashSet<>(e.getValue())));
+            }
+            this.inSyncAllocationIds = Collections.unmodifiableMap(tmp);
+        } else {
+            this.inSyncAllocationIds = Collections.emptyMap();
+        }

1030-1042: Pretty print of new metadata is useful; consider stable ordering for in-sync IDs

Including primaryTerms and inSyncAllocationIds in prettyPrint() is very helpful for debugging. If you care about stable human‑readable output, you may want to iterate inSyncAllocationIds by sorted shard ID (e.g., via new TreeSet<>(inSyncAllocationIds.keySet())) rather than relying on map iteration order, but that’s purely cosmetic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61b2c5c and b5fc89a.

📒 Files selected for processing (2)
  • server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (12 hunks)
  • server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/src/main/java/org/opensearch/cluster/routing/RoutingTableIncrementalDiff.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
🔇 Additional comments (3)
server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java (3)

35-72: Imports for versioning/serialization look consistent

The added imports (Version, DiffableUtils, IndexMetadataUpdater, SequenceNumbers, Arrays, Objects, TreeSet) are all used and appropriate for the new routing metadata and serialization logic. No issues here.


166-188: Lenient validation for new fields matches stated design

The additional checks on primaryTerms length and inSyncAllocationIds size, gated on “populated” (non‑null/non‑empty), align with the intent that these fields are optional/unpopulated in this PR and only validated when present. No functional issues noticed here.


451-487: Equality, hashCode, and toString integration for new fields looks good

Including primaryTerms and inSyncAllocationIds in equals, hashCode, and toString via Arrays.* and Objects.* is null‑safe and keeps identity semantics aligned with the new metadata. This is consistent and looks correct.

Comment on lines 560 to 617
private final Index index;
private final Map<Integer, IndexShardRoutingTable> shards = new HashMap<>();
private long[] primaryTerms = null;
private final Map<Integer, Set<String>> inSyncAllocationIds;
private int numberOfShards;

public Builder(Index index) {
this.index = index;
this.inSyncAllocationIds = new HashMap<>();
}

public Builder setPrimaryTerms(final long[] primaryTerms) {
this.primaryTerms = primaryTerms.clone();
return this;
}

public Builder setPrimaryTerm(int shardId, long primaryTerm) {
if (primaryTerms == null) {
initializePrimaryTerms();
}
this.primaryTerms[shardId] = primaryTerm;
return this;
}

public long getPrimaryTerm(int shardId) {
if (primaryTerms == null) {
initializePrimaryTerms();
}
return this.primaryTerms[shardId];
}

private void initializePrimaryTerms() {
assert primaryTerms == null;
if (numberOfShards < 0) {
throw new IllegalStateException("you must set the number of shards before setting/reading primary terms");
}
primaryTerms = new long[numberOfShards];
Arrays.fill(primaryTerms, SequenceNumbers.UNASSIGNED_PRIMARY_TERM);
}

public Builder setInSyncAllocationIds(int shardId, Set<String> allocationIds) {
inSyncAllocationIds.put(shardId, new HashSet<>(allocationIds));
return this;
}

public Builder setInSyncAllocationIds(Map<Integer, Set<String>> inSyncAllocationIds) {
this.inSyncAllocationIds.putAll(inSyncAllocationIds);
return this;
}

public Set<String> getInSyncAllocationIds(int shardId) {
return inSyncAllocationIds.get(shardId);
}

public Builder setNumberOfShards(int numberOfShards) {
this.numberOfShards = numberOfShards;
return this;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Builder numberOfShards / initializePrimaryTerms contract is easy to misuse

The builder changes are generally good, but there’s a subtle footgun:

  • numberOfShards defaults to 0.
  • initializePrimaryTerms() only throws when numberOfShards < 0, then allocates new long[numberOfShards].
  • setPrimaryTerm() and getPrimaryTerm() call initializePrimaryTerms() when primaryTerms == null.

If a caller forgets to call setNumberOfShards() before using these methods (or before build() when primaryTerms is still null), initializePrimaryTerms() will happily create a long[0], and subsequent access like this.primaryTerms[shardId] will fail with ArrayIndexOutOfBoundsException.

Given this is a public builder API, I’d recommend tightening the contract:

  • Either infer numberOfShards from shards.size() when it hasn’t been set, or
  • Treat “unset” explicitly and fail fast.

For example:

-        private int numberOfShards;
+        private int numberOfShards; // 0 means "not explicitly set"
@@
         private void initializePrimaryTerms() {
             assert primaryTerms == null;
-            if (numberOfShards < 0) {
-                throw new IllegalStateException("you must set the number of shards before setting/reading primary terms");
-            }
-            primaryTerms = new long[numberOfShards];
+            int shardsCount = numberOfShards > 0 ? numberOfShards : shards.size();
+            if (shardsCount <= 0) {
+                throw new IllegalStateException("you must set the number of shards before setting/reading primary terms");
+            }
+            numberOfShards = shardsCount;
+            primaryTerms = new long[numberOfShards];
             Arrays.fill(primaryTerms, SequenceNumbers.UNASSIGNED_PRIMARY_TERM);
         }
@@
         public IndexRoutingTable build() {
-            if (primaryTerms == null) {
-                initializePrimaryTerms();
-            } else if (primaryTerms.length != numberOfShards) {
+            if (primaryTerms == null) {
+                initializePrimaryTerms();
+            } else if (numberOfShards > 0 && primaryTerms.length != numberOfShards) {
                 throw new IllegalStateException(
                     "primaryTerms length is [" + primaryTerms.length + "] but should be equal to number of shards [" + numberOfShards + "]"
                 );
             }

This keeps existing builder usages working (deriving from shards.size() when explicit information is missing) while still enforcing a consistent array length once numberOfShards is known.

Also applies to: 591-598, 600-617, 1009-1026

🤖 Prompt for AI Agents
In server/src/main/java/org/opensearch/cluster/routing/IndexRoutingTable.java
around lines 560-617, the Builder currently defaults numberOfShards to 0 so
initializePrimaryTerms() can silently create a zero-length primaryTerms array
causing ArrayIndexOutOfBounds later; change initializePrimaryTerms() to treat
"unset" explicitly by inferring numberOfShards from shards.size() when
numberOfShards is not set (or <= 0), and if that still yields 0 throw an
IllegalStateException requiring numberOfShards to be set; also validate
setNumberOfShards to require a positive value (or refuse changes after
primaryTerms is initialized) so primaryTerms length is always consistent with
the expected shard count.

@github-actions
Copy link
Contributor

❌ Gradle check result for b5fc89a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


private void initializePrimaryTerms() {
assert primaryTerms == null;
if (numberOfShards < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the check be <= 0? Is there any case when numberOfShards can be 0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants