Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When set enable-table-across-nodes = true and write-key-threshold large than 0, there may be a risk of span loss, causing the changefeed stucks. #11675

Open
asddongmen opened this issue Oct 21, 2024 · 0 comments
Labels

Comments

@asddongmen
Copy link
Contributor

What did you do?

  1. create a changefeed sink to kafka with config below, to enable split table feature:
[scheduler]
# Allocate tables to multiple TiCDC nodes for replication on a per-Region basis.
# Note: This configuration item only takes effect on Kafka changefeeds and is not supported on MySQL changefeeds.
# The value is "false" by default. Set it to "true" to enable this feature.
enable-table-across-nodes = true
# When `enable-table-across-nodes` is enabled, there are two allocation modes:
# 1. Allocate tables based on the number of Regions, so that each TiCDC node handles roughly the same number of Regions. If the number of Regions for a table exceeds the value of `region-threshold`, the table will be allocated to multiple nodes for replication. The default value of `region-threshold` is 10000.
region-threshold = 10000
# 2. Allocate tables based on the write traffic, so that each TiCDC node handles roughly the same number of modified rows. Only when the number of modified rows per minute in a table exceeds the value of `write-key-threshold`, will this allocation take effect.
write-key-threshold = 30000

What did you expect to see?

Changefeed does not stuck.

What did you see instead?

changefeed stuck and the log below was found:

[2024/10/18 15:24:32.592 +00:00] [WARN] [replication_manager.go:659] ["schedulerv3: cannot advance checkpoint since missing span"] [namespace=default] [changefeed=changefeed-1411531] [tableSpanFound=true] [tableSpanStartFound=true] [tableSpanEndFound=false] [tableHasHole=false] [tableID=138]

[2024/10/18 14:55:00.581 +00:00] [INFO] [splitter_write.go:85] ["schedulerv3: split span by written keys"] [namespace=default] [changefeed=mask] [span={table_id:138,start_key:7480000000000000ff8a5f720000000000fa,end_key:7480000000000000ff8a5f730000000000fa}] [perSpanRegionCounts="[1,50000,33500,50000,9617,50000,50000,50000,50000,50000,50000,50000,50000,32537,50000,50000,30116,50000,50000,50000,45276,50000,17686,50000,14928,50000,18032,50000,14914,50000,14923,50000,37043,50000,50000,50000,50000,50000,34708,44376,50000,50000,50000,47335,50000,50000,30725,50000,48248,46353,50000]"] [weights="[1152299,103933,339943,119560,333110,220436,266746,191390,132069,268528,249089,140660,268489,390461,136621,179723,324607,123837,256286,244359,374346,134260,448703,136930,359210,142134,326213,149371,360444,183211,409529,104769,317823,234724,157903,313222,229400,214791,317821,318785,164366,317753,262510,338641,129014,197387,382302,158424,317822,376275,173449]"] [spans=51] [totalCaptures=5] [writeKeyThreshold=30000] [spanRegionLimit=50000]

After the split operation, the end_key: 7480000000000000ff8a5f730000000000fa never appeared in the logs; under normal circumstances, it should have been printed in the scheduler's logs (add table).

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

v7.5.3
@asddongmen asddongmen changed the title When enabling the split table feature, there may be a risk of span loss, causing the changefeed stucks. When set enable-table-across-nodes = true, there may be a risk of span loss, causing the changefeed stucks. Oct 21, 2024
@asddongmen asddongmen changed the title When set enable-table-across-nodes = true, there may be a risk of span loss, causing the changefeed stucks. When set enable-table-across-nodes = true and write-key-threshold large than 0, there may be a risk of span loss, causing the changefeed stucks. Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant