-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: Fix a race in catchup scan completion that may send an error. #77696
Comments
cc @cockroachdb/cdc |
Specifically, we should prioritize sending a checkpoint event that contains a non-zero timestamp after a catch-up scan completes and before the Ensuring this is going to be a little tricky, because a non-zero checkpoint timestamp depends on the completion of the Footnotes |
Making guarantees here seems difficult due to the shared An option that avoids this issue is to allow each individual |
I think the original premise for this issue was moderately incorrect. We only return a However, the later realization that an empty checkpoint is insufficient to ensure forward progress remains. There's no guarantee that any non-empty checkpoint will land in the buffer before it reaches capacity. I have a prototype that implements the "allow each individual registration to perform its own initResolvedTSScan" approach, but it is moderately complex. I'd be interested in determining how frequently we hit this case of only publishing an empty checkpoint before returning a |
No, I don't think we have that. What we know is that the catchup scan was long time in the past, and that the data included large JSONb blobs, so that the consumer was not too fast to consume. |
#77724 prototypes a fix for this issue. As I mentioned in the PR, I don't think we should try to merge it right away. Instead, I think we should add instrumentation (logging or a metric) to help us determine whether this is a real problem in these cases. There is a race here, but it is less severe than previously believed, so it's not clear whether this is actually leading to starvation or not. |
Customer enabled vlogging in distsender, and we appear to be hitting slow consumers errors very frequently. |
@nvanbenschoten just want to make sure you saw the comment above -- do you think we should |
@miretskiy thanks for letting me know. If we are seeing the theorized starvation in practice then I do think it's worthwhile to resurrect the PR. In the logs, did we repeatedly see the I can take this as an action item for v23.1. Do we need it sooner?
I think this kind of test would be worthwhile to demonstrate this behavior. It would also force us to express the relationship between catch-up scan throughput per range, new write throughput per range, and rangefeed-level buffering per-range. It would be nice to be able to say that a rangefeed (with retry loop) will always be able to eventually catch-up and establish a stable connection if |
…MER error Fixes cockroachdb#77696. This commit ensures progress of per-range rangefeeds in the presence of large catch-up scans by guaranteeing that a non-empty checkpoint is published before returning a REASON_SLOW_CONSUMER error to the client. This ensures that the client doesn't spin in `DistSender` on the same catch-up span without advancing its frontier. It does so by having rangefeed registrations perform an ad-hoc resolved timestamp computation in cases where the registration's buffer hits a memory limit before it succeeds in publishing a non-empty checkpoint. In doing so, we can make a loose guarantee (assuming timely closed timestamp progression) that a rangefeed with a client-side retry loop will always be able to catch-up and converge towards a stable connection as long as its rate of consumption is greater than the rate of production on the table. In other words, if `catch_up_scan_rate > new_write_rate`, the retry loop will make forward progress and eventually stop hitting REASON_SLOW_CONSUMER errors. A nearly viable alternative to this ad-hoc scan is to ensure that the processor-wide resolved timestamp tracker publishes at least one non-zero checkpoint on each registration before the registration is allowed to fail. This runs into the issues described in cockroachdb#77696 (comment). Specifically, because this tracker is shared with other registrations, it continues to advance even after the stream of events has been broken to an overflowing registration. That means that a checkpoint computed after the registration has overflowed cannot be published without violating the ordering contract of rangefeed checkpoints ("all prior events have been seen"). The checkpoint published after the initial catch-up scan needs to be coherent with the state of the range that the catch-up scan saw. This change should be followed up with system-level testing that exercises a changefeed's ability to be unpaused after a long amount of time on a table with a high rate of writes. That is an example of the kind of situation that this change aims to improve. Release justification: None. Wait on this. Release note (enterprise change): Changefeeds are now guaranteed to make forward progress while performing a large catch-up scan on a table with a high rate of writes.
As noticed by @nvanbenschoten , it is possible that while the catchup scan running,
the buffer for raft events may get over full, and in that case, when catchup scan completes,
we may send an overfull error instead of checkpoint event. We should prioritize sending
checkpoint event.
CRDB-8666
Jira issue: CRDB-13720
The text was updated successfully, but these errors were encountered: