Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: panics in nextFlushWithJitter when min_checkpoint_frequency is zero #125312

Closed
wenyihu6 opened this issue Jun 7, 2024 · 3 comments · Fixed by #125317
Closed

cdc: panics in nextFlushWithJitter when min_checkpoint_frequency is zero #125312

wenyihu6 opened this issue Jun 7, 2024 · 3 comments · Fixed by #125317
Assignees
Labels
A-cdc Change Data Capture backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-cdc

Comments

@wenyihu6
Copy link
Contributor

wenyihu6 commented Jun 7, 2024

Describe the problem

In 24.1, we introduced changefeed.aggregator.flush_jitter in #114161. Having zero min_checkpoint_frequency could cause panics here.

More details in https://github.com/cockroachlabs/support/issues/2984.

Jira issue: CRDB-39380

@wenyihu6 wenyihu6 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 7, 2024
Copy link

blathers-crl bot commented Jun 7, 2024

Hi @wenyihu6, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@wenyihu6 wenyihu6 added the A-cdc Change Data Capture label Jun 7, 2024
@wenyihu6 wenyihu6 self-assigned this Jun 7, 2024
@blathers-crl blathers-crl bot added the T-cdc label Jun 7, 2024
Copy link

blathers-crl bot commented Jun 7, 2024

cc @cockroachdb/cdc

@wenyihu6 wenyihu6 added O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 backport-23.2.x Flags PRs that need to be backported to 23.2. backport-23.1.x Flags PRs that need to be backported to 23.1 labels Jun 7, 2024
@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jun 7, 2024

The problematic pr was also backported to 23.1 and 23.2, but the setting was disabled by default.

wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 7, 2024
…int_frequency is set to 0

Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 7, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 7, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 10, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
craig bot pushed a commit that referenced this issue Jun 10, 2024
125226: tenantcostclient: convert tenant cost client to use token terminology r=andy-kimball a=andy-kimball

Convert the tenant cost client  to use token terminology rather than RU terminology. We are generalizing the tenant cost code to support vCPUs as an alternate measure of cost, and want more generic code that is agnostic to the particular unit of cost.

This PR is purely mechanical and kept separate in order to reduce the size of later reviews.

Epic: CC-28471

Release note: None

125317: changefeedccl: fix panics when min_checkpoint_frequency is 0 r=rharding6373 a=wenyihu6

Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: #125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.

125350: drt: add date to roachtest_ops.log entries r=vidit-bhat a=vidit-bhat

Previously, roachtest_ops.log entries show the current timestamp in hh:mm:ss without the timezone. This makes correlating logs quite tricky. This patch add date and timezone to the logs.

Fixes: #125304
Release note: None

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
Co-authored-by: Vidit Bhat <vidit.bhat@cockroachlabs.com>
@craig craig bot closed this as completed in f28b826 Jun 10, 2024
blathers-crl bot pushed a commit that referenced this issue Jun 10, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: #125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 11, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 11, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 14, 2024
Previously, we introduced the cluster setting changefeed.aggregator.flush_jitter
to apply some jitter to the aggregator flush frequency. This was done to reduce
the load on the coordinator by spreading out aggregator flush times. However,
that PR did not account for the case where min_checkpoint_frequency is set to
zero, which could cause panics due to rand.Int63(0). This patch fixes the issue
by adding a check for when the frequency is zero, ensuring that jitter is not
applied in such cases.

Fixes: cockroachdb#125312

Release note (bug fix): Fixed a bug in v24.1, v23.2, and v23.1 where using
changefeed.aggregator.flush_jitter with min_checkpoint_frequency set to zero
could cause panics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant