cordinator(ticdc): Fix Puller Resolved TS Lag Calculation and Deprecate current_ts Field in Stats (#11624) #11645
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an automated cherry-pick of #11624
What problem does this PR solve?
Issue Number: close #11561
What is changed and how it works?
Summary:
This PR addresses a critical issue in the Puller Resolved TS Lag metric, where the lag was incorrectly displayed as -55 years. The root cause of the problem was the improper initialization of the
CurrentTs
field in theStats
structure withintable.pb.go
. This led to a situation whereCurrentTs
remained zero whileResolvedTs
was correctly populated, resulting in a negative lag calculation that erroneously reflected a time difference of 55 years (from 1970 to the current year).Detailed Explanation:
Issue Background:
CurrentTs
andResolvedTs
from theStats
structure.NewReplicationSet
function did not correctly initializeCurrentTs
, leading to incorrect lag values when metrics were collected.Design Intent of
current_ts
:current_ts
field was initially designed to record the time at the processor side, which would then be used by the coordinator for lag calculation. This design was meant to avoid including the transmission delay and processing time on the coordinator side, as these could introduce inaccuracies in the lag calculation.current_ts
field may introduce a slight inaccuracy in the lag calculation, as the "current time" will now be fetched directly by the coordinator rather than from the processor. This could result in a minor error due to transmission delays and processing time. However, this inaccuracy is negligible for practical purposes, as the delay is typically within milliseconds and does not significantly impact the metric, which is measured in seconds.pb
) structure, it added extra overhead to every network transmission by carrying an additional value. It also introduced more complexity into the code, requiring maintenance of this field, which was part of the issue that led to the incorrect metric calculations in this case.Root Cause:
CurrentTs
from theStats
structure was flawed because this timestamp was not reliably up-to-date.CurrentTs
stored in theStats
structure.Solution Implemented:
CurrentTs
field within the codebase, particularly in metric calculations, have been updated to reflect this change.Deprecation of
current_ts
Field:current_ts
field in theStats
structure, defined incdc/processor/tablepb/table.proto
, has been marked as deprecated.Stats
structure is used for data transmission between the Coordinator and Processor, wherecurrent_ts
was unnecessarily included in the state for each Table Span.current_ts
within the project, ensuring that the Coordinator and Processor no longer rely on this outdated field.Impact:
This PR ensures accurate lag calculation in the Puller Resolved TS Lag metric and removes the outdated
current_ts
field, improving both performance and maintainability by reducing unnecessary transmission overhead and field maintenance. While this change may introduce a slight inaccuracy in the lag calculation due to the use of coordinator-side timestamps, this impact is minimal and well within acceptable thresholds.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No. The deprecation of the current_ts field in the Stats structure is backward-compatible because the field is not actively used elsewhere in the project beyond the affected metrics. The existing field remains in the protocol buffer definition but is marked as deprecated, so it won’t break compatibility with older versions that still use the field.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note