Skip to content

changefeedccl: Improve aggregator -> coordinator flush performance #114133

@miretskiy

Description

@miretskiy

See https://cockroachlabs.slack.com/archives/C05VAJ5H3QS/p1699528303796609 for discussion.

At very large scales (150k replicas/node), changefeed aggregators can still keep up (with mux rangefeed).
However, when it's time for aggregators to flush their progress to change frontier, they all tend
to flush at the same time:

Screenshot 2023-11-09 at 15 20 32

This make sense as in the case when aggregators keep up, their frontier will advance to the same timestamp
at the same time, and they will want to flush at roughly the same time. We should add a bit of jitter to each aggregator.

In addition, when aggregator flushes its span frontier to the coordinator, the flush happens on the "main" processor goroutine
(tick() function). That's suboptimal for 2 reasons:

  1. We fall ever slightly behind consuming from blocking buffer. The effective throughput drops while the flush is pending; and the flush might be pending for some time since the span frontier might be large (i.e. we might be sending multi MB message to the coordinator)
  2. The aggregator processor is also single threaded -- thus when all aggregators flush at roughly same time, they will experience "head of queue" blocking. It is even possible (or even likely?) that flushes on the last few aggregators may be blocked for so long that the rangefeeds on those aggregators might disconnect with SLOW_CONSUMER error (i.e. we blocked long enough to fill up blocking buffer) -- and that causes more expensive and unnecessary catchup scans.

The solution on the aggregator side is quite simple: address (or at least mitigate) second issue by adding jitter on the flush logic;
The first issue should be addressed by trying to flush on a dedicated goroutine. The span frontier we use is thread safe, so that's okay. But, this might be made more complicated due to how distsql procs work.

Jira issue: CRDB-33363

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cdcChange Data CaptureC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-23.2-scale-testingissues found during 23.2 scale testingO-testclusterIssues found or occurred on a test cluster, i.e. a long-running internal clusterT-cdc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions