-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
Indexing:ReplicationIssues and PRs related to core replication framework eg segrepIssues and PRs related to core replication framework eg segrepbugSomething isn't workingSomething isn't working
Description
Describe the bug
Segment replication lag metric seems to be incorrect. On testing with both push and pull based indexing, the observation is the replication lag (segments.segment_replication.max_replication_lag from node stats API) catches up soon after indexing stops. But the replication lag value does not seem to match bytes behind and other metrics, and is very high.
We convert the replication lag in milliseconds and is seen in days in the following graph.

Verify/confirm if this is a bug.
Related component
Indexing:Replication
To Reproduce
- Setup a cluster running on segment replication with remote store (GCS). OS 3.x is used (latest main branch - 3.1.0 unreleased)
- Note down segment replication metrics by calling node stats API (segments.segment_replication.max_replication_lag metric)
- Verify the metric is correct and matching other segrep metrics (like segments.segment_replication.max_bytes_behind)
Expected behavior
Replication lag metric is similar to bytes behind and other segrep metrics
Additional Details
Metadata
Metadata
Assignees
Labels
Indexing:ReplicationIssues and PRs related to core replication framework eg segrepIssues and PRs related to core replication framework eg segrepbugSomething isn't workingSomething isn't working