diff --git a/ticdc/ticdc-changefeed-config.md b/ticdc/ticdc-changefeed-config.md index 934c07474fd75..f3963e79a011d 100644 --- a/ticdc/ticdc-changefeed-config.md +++ b/ticdc/ticdc-changefeed-config.md @@ -36,345 +36,577 @@ Info: {"upstream_id":7178706266519722477,"namespace":"default","id":"simple-repl This section introduces the configuration of a replication task. +### `memory-quota` + +- Specifies the memory quota (in bytes) that can be used in the capture server by the sink manager. If the value is exceeded, the overused part will be recycled by the go runtime. +- Default value: `1073741824` (1 GiB) + +### `case-sensitive` + +- Specifies whether the database names and tables in the configuration file are case-sensitive. Starting from v6.5.6, v7.1.3, and v7.5.0, the default value changes from `true` to `false`. +- This configuration item affects configurations related to filter and sink. +- Default value: `false` + +### `enable-sync-point` New in v6.3.0 + +- Specifies whether to enable the Syncpoint feature, which is supported starting from v6.3.0 and is disabled by default. +- Starting from v6.4.0, only the changefeed with the `SYSTEM_VARIABLES_ADMIN` or `SUPER` privilege can use the TiCDC Syncpoint feature. +- This configuration item only takes effect if the downstream is TiDB. +- Default value: `false` + +### `sync-point-interval` + +- Specifies the interval at which Syncpoint aligns the upstream and downstream snapshots. +- This configuration item only takes effect if the downstream is TiDB. +- The format is `"h m s"`. For example, `"1h30m30s"`. +- Default value: `"10m"` +- Minimum value: `"30s"` + +### `sync-point-retention` + +- Specifies how long the data is retained by Syncpoint in the downstream table. When this duration is exceeded, the data is cleaned up. +- This configuration item only takes effect if the downstream is TiDB. +- The format is `"h m s"`. For example, `"24h30m30s"`. +- Default value: `"24h"` + +### `sql-mode` New in v6.5.6, v7.1.3, and v7.5.0 + +- Specifies the [SQL mode](/sql-mode.md) used when parsing DDL statements. Multiple modes are separated by commas. +- Default value: `"ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"`, which is the same as the default SQL mode of TiDB + +### `bdr-mode` + +- To set up BDR (Bidirectional replication) clusters using TiCDC, modify this parameter to `true` and set the TiDB clusters to BDR mode. For more information, see [Bidirectional Replication](/ticdc/ticdc-bidirectional-replication.md#bidirectional-replication). +- Default value: `false`, indicating that bi-directional replication (BDR) mode is not enabled + +### `changefeed-error-stuck-duration` + +- Specifies the duration for which the changefeed is allowed to automatically retry when internal errors or exceptions occur. +- The changefeed enters the failed state if internal errors or exceptions occur in the changefeed and persist longer than the duration set by this parameter. +- When the changefeed is in the failed state, you need to restart the changefeed manually for recovery. +- The format is `"h m s"`. For example, `"1h30m30s"`. +- Default value: `"30m"` + +### mounter + +#### `worker-num` + +- Specifies the number of threads with which the mounter decodes KV data. +- Default value: `16` + +### filter + +#### `ignore-txn-start-ts` + +- Ignores the transaction of specified start_ts. + + + +#### `rules` + +- Specifies the filter rules. For more information, see [Syntax](/table-filter.md#syntax). + + + +#### filter.event-filters + +For more information, see [Event filter rules](/ticdc/ticdc-filter.md#event-filter-rules). + +##### `matcher` + +- `matcher` is an allow list. `matcher = ["test.worker"]` means this rule only applies to the `worker` table in the `test` database. + +##### `ignore-event` + +- `ignore-event = ["insert"]` ignores `INSERT` events. +- `ignore-event = ["drop table", "delete"]` ignores the `DROP TABLE` DDL events and the `DELETE` DML events. Note that when a value in the clustered index column is updated in TiDB, TiCDC splits an `UPDATE` event into `DELETE` and `INSERT` events. TiCDC cannot identify such events as `UPDATE` events and thus cannot correctly filter out such events. + +##### `ignore-sql` + +- `ignore-sql = ["^drop", "add column"]` ignores DDLs that start with `DROP` or contain `ADD COLUMN`. + +##### `ignore-delete-value-expr` + +- `ignore-delete-value-expr = "name = 'john'"` ignores `DELETE` DMLs that contain the condition `name = 'john'`. + +##### `ignore-insert-value-expr` + +- `ignore-insert-value-expr = "id >= 100"` ignores `INSERT` DMLs that contain the condition `id >= 100` + +##### `ignore-update-old-value-expr` + +- `ignore-update-old-value-expr = "age < 18"` ignores `UPDATE` DMLs whose old value contains `age < 18` + +##### `ignore-update-new-value-expr` + +- `ignore-update-new-value-expr = "gender = 'male'"` ignores `UPDATE` DMLs whose new value contains `gender = 'male'` + +### scheduler + +#### `enable-table-across-nodes` + +- Allocate tables to multiple TiCDC nodes for replication on a per-Region basis. +- This configuration item only takes effect on Kafka changefeeds and is not supported on MySQL changefeeds. +- When `enable-table-across-nodes` is enabled, there are two allocation modes: + + 1. Allocate tables based on the number of Regions, so that each TiCDC node handles roughly the same number of Regions. If the number of Regions for a table exceeds the value of [`region-threshold`](#region-threshold), the table will be allocated to multiple nodes for replication. The default value of `region-threshold` is `10000`. + 2. Allocate tables based on the write traffic, so that each TiCDC node handles roughly the same number of modified rows. Only when the number of modified rows per minute in a table exceeds the value of [`write-key-threshold`](#write-key-threshold), will this allocation take effect. + + You only need to configure one of the two modes. If both `region-threshold` and `write-key-threshold` are configured, TiCDC prioritizes the traffic allocation mode, namely `write-key-threshold`. + +- The value is `false` by default. Set it to `true` to enable this feature. +- Default value: `false` + +#### `region-threshold` + +- Default value: `10000` + +#### `write-key-threshold` + +- Default value: `0`, which means that the traffic allocation mode is not used by default + +### sink + + + +#### `dispatchers` + +- For the sink of MQ type, you can use dispatchers to configure the event dispatcher. +- Starting from v6.1.0, TiDB supports two types of event dispatchers: partition and topic. +- The matching syntax of matcher is the same as the filter rule syntax. +- This configuration item only takes effect if the downstream is MQ. +- When the downstream MQ is Pulsar, if the routing rule for `partition` is not specified as any of `ts`, `index-value`, `table`, or `default`, each Pulsar message will be routed using the string you set as the key. For example, if you specify the routing rule for a matcher as the string `code`, then all Pulsar messages that match that matcher will be routed with `code` as the key. + +#### `column-selectors` New in v7.5.0 + +- Selects specific columns for replication. This only takes effect when the downstream is Kafka. + +#### `protocol` + +- Specifies the protocol format used for encoding messages. +- This configuration item only takes effect if the downstream is Kafka, Pulsar, or a storage service. +- When the downstream is Kafka, the protocol can be canal-json, avro, debezium, open-protocol, or simple. +- When the downstream is Pulsar, the protocol can only be canal-json. +- When the downstream is a storage service, the protocol can only be canal-json or csv. + + + +#### `delete-only-output-handle-key-columns` New in v7.2.0 + +- Specifies the output of DELETE events. This parameter is valid only for canal-json and open-protocol protocols. +- This parameter is incompatible with `force-replicate`. If both this parameter and `force-replicate` are set to `true`, TiCDC reports an error when creating a changefeed. +- The Avro protocol is not controlled by this parameter and always outputs only the primary key columns or unique index columns. +- The CSV protocol is not controlled by this parameter and always outputs all columns. +- Default value: `false`, which means outputting all columns +- When you set it to `true`, only primary key columns or unique index columns are output. + +#### `schema-registry` + +- Specifies the schema registry URL. +- This configuration item only takes effect if the downstream is MQ. + + + +#### `encoder-concurrency` + +- Specifies the number of encoder threads used when encoding data. +- This configuration item only takes effect if the downstream is MQ. +- Default value: `32` + +#### `enable-kafka-sink-v2` + +> **Warning:** +> +> This configuration is an experimental feature. It is not recommended to use it in production environments. + +- Specifies whether to enable kafka-sink-v2 that uses the kafka-go sink library. +- This configuration item only takes effect if the downstream is MQ. +- Default value: `false` + +#### `only-output-updated-columns` New in v7.1.0 + +- Specifies whether to only output the updated columns. +- This configuration item only applies to the MQ downstream using the open-protocol and canal-json. +- Default value: `false` + + + +#### `terminator` + +- This configuration item is only used when you replicate data to storage sinks and can be ignored when replicating data to MQ or MySQL sinks. +- Specifies the row terminator, used for separating two data change events. +- Default value: `""`, which means `\r\n` is used + +#### `date-separator` + +- Specifies the date separator type used in the file directory. For more information, see [Data change records](/ticdc/ticdc-sink-to-cloud-storage.md#data-change-records). +- This configuration item only takes effect if the downstream is a storage service. +- Default value: `day`, which means separating files by day +- Value options: `none`, `year`, `month`, `day` + +#### `enable-partition-separator` + +- Controls whether to use partitions as the separation string. +- This configuration item only takes effect if the downstream is a storage service. +- Default value: `true`, which means that partitions in a table are stored in separate directories +- It is recommended that you keep the value as `true` to avoid potential data loss in downstream partitioned tables [#8581](https://github.com/pingcap/tiflow/issues/8581). For usage examples, see [Data change records](/ticdc/ticdc-sink-to-cloud-storage.md#data-change-records). + +#### `debezium-disable-schema` + +- Controls whether to disable the output of schema information. +- Default value: `false`, which means enabling the output of schema information +- This parameter only takes effect when the sink type is MQ and the output protocol is Debezium. + +#### sink.csv New in v6.5.0 + +Starting from v6.5.0, TiCDC supports saving data changes to storage services in CSV format. Ignore the following configurations if you replicate data to MQ or MySQL sinks. + +##### `delimiter` + +- Specifies the character used to separate fields in the CSV file. The value must be an ASCII character. +- Default value: `,` + +##### `quote` + +- Specifies the quotation character used to surround fields in the CSV file. If the value is empty, no quotation is used. +- Default value: `"` + +##### `null` + +- Specifies the character displayed when a CSV column is NULL. +- Default value: `\N` + +##### `include-commit-ts` + +- Controls whether to include commit-ts in CSV rows. +- Default value: `false` + +##### `binary-encoding-method` + +- Specifies the encoding method of binary data. +- Default value: `base64` +- Value option: `base64`, `hex` + +##### `output-handle-key` + +- Controls whether to output handle key information. This configuration parameter is for internal implementation only, so it is not recommended to set it. +- Default value: `false` + +##### `output-old-value` + +- Controls whether to output the value before the row data changes. The default value is false. +- When it is enabled (setting it to `true`), the `UPDATE` event will output two rows of data: the first row is a `DELETE` event that outputs the data before the change; the second row is an `INSERT` event that outputs the changed data. +- When it is enabled, the `"is-update"` column will be added before the column with data changes. This added column is used to identify whether the data change of the current row comes from the `UPDATE` event or the original `INSERT` or `DELETE` event. If the data change of the current row comes from the `UPDATE` event, the value of the `"is-update"` column is `true`. Otherwise, it is `false`. +- Default value: `false` + +Starting from v8.0.0, TiCDC supports the Simple message encoding protocol. The following are the configuration parameters for the Simple protocol. For more information about the protocol, see [TiCDC Simple Protocol](/ticdc/ticdc-simple-protocol.md). + +The following configuration parameters control the sending behavior of bootstrap messages. + +#### `send-bootstrap-interval-in-sec` + +- Controls the time interval for sending bootstrap messages, in seconds. +- Default value: `120`, which means that a bootstrap message is sent every 120 seconds for each table +- Unit: Seconds + +#### `send-bootstrap-in-msg-count` + +- Controls the message interval for sending bootstrap, in message count. +- Default value: `10000`, which means that a bootstrap message is sent every 10000 row changed messages for each table +- If you want to disable the sending of bootstrap messages, set both [`send-bootstrap-interval-in-sec`](#send-bootstrap-interval-in-sec) and `send-bootstrap-in-msg-count` to `0`. + +#### `send-bootstrap-to-all-partition` + +- Controls whether to send bootstrap messages to all partitions. +- Setting it to `false` means bootstrap messages are sent to only the first partition of the corresponding table topic. +- Default value: `true`, which means that bootstrap messages are sent to all partitions of the corresponding table topic + +#### sink.kafka-config.codec-config + +##### `encoding-format` + +- Controls the encoding format of the Simple protocol messages. Currently, the Simple protocol message supports `json` and `avro` encoding formats. +- Default value: `json` +- Value options: `json`, `avro` + +#### sink.open + +##### `output-old-value` + +- Controls whether to output the value before the row data changes. The default value is true. When it is disabled, the `UPDATE` event does not output the "p" field. +- Default value: `true` + +#### sink.debezium + +##### `output-old-value` + +- Controls whether to output the value before the row data changes. The default value is true. When it is disabled, the `UPDATE` event does not output the "before" field. +- Default value: `true` + +### consistent + +Specifies the replication consistency configurations for a changefeed when using the redo log. For more information, see [Eventually consistent replication in disaster scenarios](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios). + +Note: The consistency-related configuration items only take effect when the downstream is a database and the redo log feature is enabled. + +#### `level` + +- The data consistency level. `"none"` means that the redo log is disabled. +- Default value: `"none"` +- Value options: `"none"`, `"eventual"` + +#### `max-log-size` + +- The max redo log size. +- Default value: `64` +- Unit: MiB + +#### `flush-interval` + +- The flush interval for redo log. +- Default value: `2000` +- Unit: milliseconds + +#### `storage` + +- The storage URI of the redo log. +- Default value: `""` + +#### `use-file-backend` + +- Specifies whether to store the redo log in a local file. +- Default value: `false` + +#### `encoding-worker-num` + +- The number of encoding and decoding workers in the redo module. +- Default value: `16` + +#### `flush-worker-num` + +- The number of flushing workers in the redo module. +- Default value: `8` + +#### `compression` + +- The behavior to compress redo log files. +- Default value: `""`, which means no compression +- Value options: `""`, `"lz4"` + +#### `flush-concurrency` + +- The concurrency for uploading a single redo file. +- Default value: `1`, which means concurrency is disabled + +### integrity + +#### `integrity-check-level` + +- Controls whether to enable the checksum validation for single-row data. +- Default value: `"none"`, which means to disable the feature +- Value options: `"none"`, `"correctness"` + +#### `corruption-handle-level` + +- Specifies the log level of the changefeed when the checksum validation for single-row data fails. +- Default value: `"warn"` +- Value options: `"warn"`, `"error"` + +### sink.kafka-config + +The following configuration items only take effect when the downstream is Kafka. + +#### `sasl-mechanism` + +- Specifies the mechanism of Kafka SASL authentication. +- Default value: `""`, indicating that SASL authentication is not used + + + +#### `sasl-oauth-client-id` + +- Specifies the client-id in the Kafka SASL OAUTHBEARER authentication. This parameter is required when the OAUTHBEARER authentication is used. +- Default value: `""` + +#### `sasl-oauth-client-secret` + +- Specifies the client-secret in the Kafka SASL OAUTHBEARER authentication. This parameter is required when the OAUTHBEARER authentication is used. +- Default value: `""` + +#### `sasl-oauth-token-url` + +- Specifies the token-url in the Kafka SASL OAUTHBEARER authentication to obtain the token. This parameter is required when the OAUTHBEARER authentication is used. +- Default value: `""` + +#### `sasl-oauth-scopes` + +- Specifies the scopes in the Kafka SASL OAUTHBEARER authentication. This parameter is optional when the OAUTHBEARER authentication is used. +- Default value: `""` + +#### `sasl-oauth-grant-type` + +- Specifies the grant-type in the Kafka SASL OAUTHBEARER authentication. This parameter is optional when the OAUTHBEARER authentication is used. +- Default value: `"client_credentials"` + +#### `sasl-oauth-audience` + +- Specifies the audience in the Kafka SASL OAUTHBEARER authentication. This parameter is optional when the OAUTHBEARER authentication is used. +- Default value: `""` + + + +#### `output-raw-change-event` + +- Controls whether to output the original data change event. For more information, see [Control whether to split primary or unique key `UPDATE` events](/ticdc/ticdc-split-update-behavior.md#control-whether-to-split-primary-or-unique-key-update-events). +- Default value: `false` + +### sink.kafka-config.glue-schema-registry-config + +The following configuration is only required when using Avro as the protocol and AWS Glue Schema Registry: + ```toml -# Specifies the memory quota (in bytes) that can be used in the capture server by the sink manager. -# If the value is exceeded, the overused part will be recycled by the go runtime. -# The default value is `1073741824` (1 GB). -# memory-quota = 1073741824 - -# Specifies whether the database names and tables in the configuration file are case-sensitive. -# Starting from v6.5.6, v7.1.3, and v7.5.0, the default value changes from true to false. -# This configuration item affects configurations related to filter and sink. -case-sensitive = false - -# Specifies whether to enable the Syncpoint feature, which is supported since v6.3.0 and is disabled by default. -# Since v6.4.0, only the changefeed with the SYSTEM_VARIABLES_ADMIN or SUPER privilege can use the TiCDC Syncpoint feature. -# Note: This configuration item only takes effect if the downstream is TiDB. -# enable-sync-point = false - -# Specifies the interval at which Syncpoint aligns the upstream and downstream snapshots. -# The format is in h m s. For example, "1h30m30s". -# The default value is "10m" and the minimum value is "30s". -# Note: This configuration item only takes effect if the downstream is TiDB. -# sync-point-interval = "5m" - -# Specifies how long the data is retained by Syncpoint in the downstream table. When this duration is exceeded, the data is cleaned up. -# The format is in h m s. For example, "24h30m30s". -# The default value is "24h". -# Note: This configuration item only takes effect if the downstream is TiDB. -# sync-point-retention = "1h" - -# Starting from v6.5.6, v7.1.3, and v7.5.0, this configuration item specifies the SQL mode used when parsing DDL statements. Multiple modes are separated by commas. -# The default value is the same as the default SQL mode of TiDB. -# sql-mode = "ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" - -# The duration for which the changefeed is allowed to automatically retry when internal errors or exceptions occur. The default value is 30 minutes. -# The changefeed enters the failed state if internal errors or exceptions occur in the changefeed and persist longer than the duration set by this parameter. -# When the changefeed is in the failed state, you need to restart the changefeed manually for recovery. -# The format of this parameter is "h m s", for example, "1h30m30s". -changefeed-error-stuck-duration = "30m" - -# The default value is false, indicating that bi-directional replication (BDR) mode is not enabled. -# To set up BDR clusters using TiCDC, modify this parameter to `true` and set the TiDB clusters to BDR mode. -# For more information, see https://docs.pingcap.com/tidb/stable/ticdc-bidirectional-replication. -# bdr-mode = false - -[mounter] -# The number of threads with which the mounter decodes KV data. The default value is 16. -# worker-num = 16 - -[filter] -# Ignores the transaction of specified start_ts. -# ignore-txn-start-ts = [1, 2] - -# Filter rules. -# Filter syntax: . -rules = ['*.*', '!test.*'] - -# Event filter rules. -# The detailed syntax is described in -# The first event filter rule. -# [[filter.event-filters]] -# matcher = ["test.worker"] # matcher is an allow list, which means this rule only applies to the worker table in the test database. -# ignore-event = ["insert"] # Ignore insert events. -# ignore-sql = ["^drop", "add column"] # Ignore DDLs that start with "drop" or contain "add column". -# ignore-delete-value-expr = "name = 'john'" # Ignore delete DMLs that contain the condition "name = 'john'". -# ignore-insert-value-expr = "id >= 100" # Ignore insert DMLs that contain the condition "id >= 100". -# ignore-update-old-value-expr = "age < 18" # Ignore update DMLs whose old value contains "age < 18". -# ignore-update-new-value-expr = "gender = 'male'" # Ignore update DMLs whose new value contains "gender = 'male'". - -# The second event filter rule. -# matcher = ["test.fruit"] # matcher is an allow list, which means this rule only applies to the fruit table in the test database. -# ignore-event = ["drop table", "delete"] # Ignore the `drop table` DDL events and the `delete` DML events. Note that when a value in the clustered index column is updated in TiDB, TiCDC splits an `UPDATE` event into `DELETE` and `INSERT` events. TiCDC cannot identify such events as `UPDATE` events and thus cannot correctly filter out such events. -# ignore-sql = ["^drop table", "alter table"] # Ignore DDL statements that start with `drop table` or contain `alter table`. -# ignore-insert-value-expr = "price > 1000 and origin = 'no where'" # Ignore insert DMLs that contain the conditions "price > 1000" and "origin = 'no where'". - -[scheduler] -# Allocate tables to multiple TiCDC nodes for replication on a per-Region basis. -# Note: This configuration item only takes effect on Kafka changefeeds and is not supported on MySQL changefeeds. -# The value is "false" by default. Set it to "true" to enable this feature. -enable-table-across-nodes = false -# When `enable-table-across-nodes` is enabled, there are two allocation modes: -# 1. Allocate tables based on the number of Regions, so that each TiCDC node handles roughly the same number of Regions. If the number of Regions for a table exceeds the value of `region-threshold`, the table will be allocated to multiple nodes for replication. The default value of `region-threshold` is 10000. -# region-threshold = 10000 -# 2. Allocate tables based on the write traffic, so that each TiCDC node handles roughly the same number of modified rows. Only when the number of modified rows per minute in a table exceeds the value of `write-key-threshold`, will this allocation take effect. -# write-key-threshold = 30000 -# Note: -# * The default value of `write-key-threshold` is 0, which means that the traffic allocation mode is not used by default. -# * You only need to configure one of the two modes. If both `region-threshold` and `write-key-threshold` are configured, TiCDC prioritizes the traffic allocation mode, namely `write-key-threshold`. - -[sink] -############ MQ sink configuration items ############ -# For the sink of MQ type, you can use dispatchers to configure the event dispatcher. -# Since v6.1.0, TiDB supports two types of event dispatchers: partition and topic. For more information, see . -# The matching syntax of matcher is the same as the filter rule syntax. For details about the matcher rules, see <>. -# Note: This configuration item only takes effect if the downstream is MQ. -# Note: When the downstream MQ is Pulsar, if the routing rule for `partition` is not specified as any of `ts`, `index-value`, `table`, or `default`, each Pulsar message will be routed using the string you set as the key. -# For example, if you specify the routing rule for a matcher as the string `code`, then all Pulsar messages that match that matcher will be routed with `code` as the key. -# dispatchers = [ -# {matcher = ['test1.*', 'test2.*'], topic = "Topic expression 1", partition = "index-value"}, -# {matcher = ['test3.*', 'test4.*'], topic = "Topic expression 2", partition = "index-value", index = "index1"}, -# {matcher = ['test1.*', 'test5.*'], topic = "Topic expression 3", partition = "table"}, -# {matcher = ['test6.*'], partition = "columns", columns = "['a', 'b']"} -# {matcher = ['test7.*'], partition = "ts"} -# ] - -# column-selectors is introduced in v7.5.0 and only takes effect when the downstream is Kafka. -# column-selectors is used to select specific columns for replication. -# column-selectors = [ -# {matcher = ['test.t1'], columns = ['a', 'b']}, -# {matcher = ['test.*'], columns = ["*", "!b"]}, -# {matcher = ['test1.t1'], columns = ['column*', '!column1']}, -# {matcher = ['test3.t'], columns = ["column?", "!column1"]}, -# ] - -# The protocol configuration item specifies the protocol format used for encoding messages. -# When the downstream is Kafka, the protocol can be canal-json, avro, debezium, open-protocol, or simple. -# When the downstream is Pulsar, the protocol can only be canal-json. -# When the downstream is a storage service, the protocol can only be canal-json or csv. -# Note: This configuration item only takes effect if the downstream is Kafka, Pulsar, or a storage service. -# protocol = "canal-json" - -# Starting from v7.2.0, the `delete-only-output-handle-key-columns` parameter specifies the output of DELETE events. This parameter is valid only for canal-json and open-protocol protocols. -# This parameter is incompatible with `force-replicate`. If both this parameter and `force-replicate` is set to `true`, TiCDC reports an error when creating a changefeed. -# The default value is false, which means outputting all columns. When you set it to true, only primary key columns or unique index columns are output. -# The Avro protocol is not controlled by this parameter and always outputs only the primary key columns or unique index columns. -# The CSV protocol is not controlled by this parameter and always outputs all columns. -delete-only-output-handle-key-columns = false - -# Schema registry URL. -# Note: This configuration item only takes effect if the downstream is MQ. -# schema-registry = "http://localhost:80801/subjects/{subject-name}/versions/{version-number}/schema" - -# Specifies the number of encoder threads used when encoding data. -# Note: This configuration item only takes effect if the downstream is MQ. -# The default value is 32. -# encoder-concurrency = 32 - -# Specifies whether to enable kafka-sink-v2 that uses the kafka-go sink library. -# Note: This configuration item is experimental, and only takes effect if the downstream is MQ. -# The default value is false. -# enable-kafka-sink-v2 = false - -# Starting from v7.1.0, this configuration item specifies whether to only output the updated columns. -# Note: This configuration item only applies to the MQ downstream using the open-protocol and canal-json. -# The default value is false. -# only-output-updated-columns = false - -############ Storage sink configuration items ############ -# The following three configuration items are only used when you replicate data to storage sinks and can be ignored when replicating data to MQ or MySQL sinks. -# Row terminator, used for separating two data change events. The default value is an empty string, which means "\r\n" is used. -# terminator = '' -# Date separator type used in the file directory. Value options are `none`, `year`, `month`, and `day`. `day` is the default value and means separating files by day. For more information, see . -# Note: This configuration item only takes effect if the downstream is a storage service. -date-separator = 'day' -# Whether to use partitions as the separation string. The default value is true, which means that partitions in a table are stored in separate directories. It is recommended that you keep the value as `true` to avoid potential data loss in downstream partitioned tables . For usage examples, see . -# Note: This configuration item only takes effect if the downstream is a storage service. -enable-partition-separator = true - -# Controls whether to disable the output of schema information. The default value is false, which means enabling the output of schema information. -# Note: This parameter only takes effect when the sink type is MQ and the output protocol is Debezium. -debezium-disable-schema = false - -# Since v6.5.0, TiCDC supports saving data changes to storage services in CSV format. Ignore the following configurations if you replicate data to MQ or MySQL sinks. -# [sink.csv] -# The character used to separate fields in the CSV file. The value must be an ASCII character and defaults to `,`. -# delimiter = ',' -# The quotation character used to surround fields in the CSV file. The default value is `"`. If the value is empty, no quotation is used. -# quote = '"' -# The character displayed when a CSV column is null. The default value is `\N`. -# null = '\N' -# Whether to include commit-ts in CSV rows. The default value is false. -# include-commit-ts = false -# The encoding method of binary data, which can be 'base64' or 'hex'. The default value is 'base64'. -# binary-encoding-method = 'base64' -# Whether to output handle key information. The default value is false. -# This configuration parameter is for internal implementation only, so it is not recommended to set it. -# output-handle-key = false -# Whether to output the value before the row data changes. The default value is false. -# When it is enabled, the UPDATE event will output two rows of data: the first row is a DELETE event that outputs the data before the change; the second row is an INSERT event that outputs the changed data. -# When it is enabled (setting it to true), the "is-update" column will be added before the column with data changes. This added column is used to identify whether the data change of the current row comes from the UPDATE event or the original INSERT/DELETE event. -# If the data change of the current row comes from the UPDATE event, the value of the "is-update" column is true. Otherwise it is false. -# output-old-value = false - -# Starting from v8.0.0, TiCDC supports the Simple message encoding protocol. The following are the configuration parameters for the Simple protocol. -# For more information about the protocol, see . -# The following configuration parameters control the sending behavior of bootstrap messages. -# send-bootstrap-interval-in-sec controls the time interval for sending bootstrap messages, in seconds. -# The default value is 120 seconds, which means that a bootstrap message is sent every 120 seconds for each table. -# send-bootstrap-interval-in-sec = 120 - -# send-bootstrap-in-msg-count controls the message interval for sending bootstrap, in message count. -# The default value is 10000, which means that a bootstrap message is sent every 10000 row changed messages for each table. -# send-bootstrap-in-msg-count = 10000 -# Note: If you want to disable the sending of bootstrap messages, set both send-bootstrap-interval-in-sec and send-bootstrap-in-msg-count to 0. - -# send-bootstrap-to-all-partition controls whether to send bootstrap messages to all partitions. -# The default value is true, which means that bootstrap messages are sent to all partitions of the corresponding table topic. -# Setting it to false means bootstrap messages are sent to only the first partition of the corresponding table topic. -# send-bootstrap-to-all-partition = true - -[sink.kafka-config.codec-config] -# encoding-format controls the encoding format of the Simple protocol messages. Currently, the Simple protocol message supports "json" and "avro" encoding formats. -# The default value is "json". -# encoding-format = "json" - -[sink.open] -# Whether to output the value before the row data changes. The default value is true. When it is disabled, the UPDATE event does not output the "p" field. -# output-old-value = true - -[sink.debezium] -# Whether to output the value before the row data changes. The default value is true. When it is disabled, the UPDATE event does not output the "before" field. -# output-old-value = true - -# Specifies the replication consistency configurations for a changefeed when using the redo log. For more information, see https://docs.pingcap.com/tidb/stable/ticdc-sink-to-mysql#eventually-consistent-replication-in-disaster-scenarios. -# Note: The consistency-related configuration items only take effect when the downstream is a database and the redo log feature is enabled. -[consistent] -# The data consistency level. Available options are "none" and "eventual". "none" means that the redo log is disabled. -# The default value is "none". -level = "none" -# The max redo log size in MB. -# The default value is 64. -max-log-size = 64 -# The flush interval for redo log. The default value is 2000 milliseconds. -flush-interval = 2000 -# The storage URI of the redo log. -# The default value is empty. -storage = "" -# Specifies whether to store the redo log in a local file. -# The default value is false. -use-file-backend = false -# The number of encoding and decoding workers in the redo module. -# The default value is 16. -encoding-worker-num = 16 -# The number of flushing workers in the redo module. -# The default value is 8. -flush-worker-num = 8 -# The behavior to compress redo log files. -# Available options are "" and "lz4". The default value is "", which means no compression. -compression = "" -# The concurrency for uploading a single redo file. -# The default value is 1, which means concurrency is disabled. -flush-concurrency = 1 - -[integrity] -# Whether to enable the checksum validation for single-row data. The default value is "none", which means to disable the feature. Value options are "none" and "correctness". -integrity-check-level = "none" -# Specifies the log level of the Changefeed when the checksum validation for single-row data fails. The default value is "warn". Value options are "warn" and "error". -corruption-handle-level = "warn" - -# The following configuration items only take effect when the downstream is Kafka. -[sink.kafka-config] -# The mechanism of Kafka SASL authentication. The default value is empty, indicating that SASL authentication is not used. -sasl-mechanism = "OAUTHBEARER" -# The client-id in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used. -sasl-oauth-client-id = "producer-kafka" -# The client-secret in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used. -sasl-oauth-client-secret = "cHJvZHVjZXIta2Fma2E=" -# The token-url in the Kafka SASL OAUTHBEARER authentication to obtain the token. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used. -sasl-oauth-token-url = "http://127.0.0.1:4444/oauth2/token" -# The scopes in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is optional when the OAUTHBEARER authentication is used. -sasl-oauth-scopes = ["producer.kafka", "consumer.kafka"] -# The grant-type in the Kafka SASL OAUTHBEARER authentication. The default value is "client_credentials". This parameter is optional when the OAUTHBEARER authentication is used. -sasl-oauth-grant-type = "client_credentials" -# The audience in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is optional when the OAUTHBEARER authentication is used. -sasl-oauth-audience = "kafka" - -# The following configuration item controls whether to output the original data change event. The default value is false. For more information, see https://docs.pingcap.com/tidb/dev/ticdc-split-update-behavior#control-whether-to-split-primary-or-unique-key-update-events. -# output-raw-change-event = false - -# The following configuration is only required when using Avro as the protocol and AWS Glue Schema Registry: -# Please refer to the section "Integrate TiCDC with AWS Glue Schema Registry" in the document "Sync Data to Kafka": https://docs.pingcap.com/tidb/dev/ticdc-sink-to-kafka#integrate-ticdc-with-aws-glue-schema-registry -# [sink.kafka-config.glue-schema-registry-config] -# region="us-west-1" -# registry-name="ticdc-test" -# access-key="xxxx" -# secret-access-key="xxxx" -# token="xxxx" - -# The following parameters take effect only when the downstream is Pulsar. -[sink.pulsar-config] -# Authentication on the Pulsar server is done using a token. Specify the value of the token. -authentication-token = "xxxxxxxxxxxxx" -# When you use a token for Pulsar server authentication, specify the path to the file where the token is located. -token-from-file="/data/pulsar/token-file.txt" -# Pulsar uses the basic account and password to authenticate the identity. Specify the account. -basic-user-name="root" -# Pulsar uses the basic account and password to authenticate the identity. Specify the password. -basic-password="password" -# The certificate path for Pulsar TLS encrypted authentication. -auth-tls-certificate-path="/data/pulsar/certificate" -# The private key path for Pulsar TLS encrypted authentication. -auth-tls-private-key-path="/data/pulsar/certificate.key" -# Path to trusted certificate file of the Pulsar TLS encrypted authentication. -tls-trust-certs-file-path="/data/pulsar/tls-trust-certs-file" -# Pulsar oauth2 issuer-url. For more information, see the Pulsar website: https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication -oauth2.oauth2-issuer-url="https://xxxx.auth0.com" -# Pulsar oauth2 audience -oauth2.oauth2-audience="https://xxxx.auth0.com/api/v2/" -# Pulsar oauth2 private-key -oauth2.oauth2-private-key="/data/pulsar/privateKey" -# Pulsar oauth2 client-id -oauth2.oauth2-client-id="0Xx...Yyxeny" -# Pulsar oauth2 oauth2-scope -oauth2.oauth2-scope="xxxx" -# The number of cached Pulsar producers in TiCDC. The value is 10240 by default. Each Pulsar producer corresponds to one topic. If the number of topics you need to replicate is larger than the default value, you need to increase the number. -pulsar-producer-cache-size=10240 -# Pulsar data compression method. No compression is used by default. Optional values are "lz4", "zlib", and "zstd". -compression-type="" -# The timeout for the Pulsar client to establish a TCP connection with the server. The value is 5 seconds by default. -connection-timeout=5 -# The timeout for Pulsar clients to initiate operations such as creating and subscribing to a topic. The value is 30 seconds by default. -operation-timeout=30 -# The maximum number of messages in a single batch for a Pulsar producer to send. The value is 1000 by default. -batching-max-messages=1000 -# The interval at which Pulsar producer messages are saved for batching. The value is 10 milliseconds by default. -batching-max-publish-delay=10 -# The timeout for a Pulsar producer to send a message. The value is 30 seconds by default. -send-timeout=30 - -# The following configuration item controls whether to output the original data change event. The default value is false. For more information, see https://docs.pingcap.com/tidb/dev/ticdc-split-update-behavior#control-whether-to-split-primary-or-unique-key-update-events. -# output-raw-change-event = false - -[sink.cloud-storage-config] -# The concurrency for saving data changes to the downstream cloud storage. -# The default value is 16. -worker-count = 16 -# The interval for saving data changes to the downstream cloud storage. -# The default value is "2s". -flush-interval = "2s" -# A data change file is saved to the cloud storage when the number of bytes in this file exceeds `file-size`. -# The default value is 67108864 (this is, 64 MiB). -file-size = 67108864 -# The duration to retain files, which takes effect only when `date-separator` is configured as `day`. Assume that `file-expiration-days = 1` and `file-cleanup-cron-spec = "0 0 0 * * *"`, then TiCDC performs daily cleanup at 00:00:00 for files saved beyond 24 hours. For example, at 00:00:00 on 2023/12/02, TiCDC cleans up files generated before 2023/12/01, while files generated on 2023/12/01 remain unaffected. -# The default value is 0, which means file cleanup is disabled. -file-expiration-days = 0 -# The running cycle of the scheduled cleanup task, compatible with the crontab configuration, with a format of ` ` -# The default value is "0 0 2 * * *", which means that the cleanup task is executed every day at 2 AM. -file-cleanup-cron-spec = "0 0 2 * * *" -# The concurrency for uploading a single file. -# The default value is 1, which means concurrency is disabled. -flush-concurrency = 1 -# The following configuration item controls whether to output the original data change event. The default value is false. For more information, see https://docs.pingcap.com/tidb/dev/ticdc-split-update-behavior#control-whether-to-split-primary-or-unique-key-update-events. -output-raw-change-event = false +region="us-west-1" +registry-name="ticdc-test" +access-key="xxxx" +secret-access-key="xxxx" +token="xxxx" ``` + +For more information, see [Integrate TiCDC with AWS Glue Schema Registry](/ticdc/ticdc-sink-to-kafka.md#integrate-ticdc-with-aws-glue-schema-registry). + +### sink.pulsar-config + +The following parameters take effect only when the downstream is Pulsar. + +#### `authentication-token` + +- Authentication on the Pulsar server is done using a token. Specify the value of the token. + +#### `token-from-file` + +- When you use a token for Pulsar server authentication, specify the path to the file where the token is located. + +#### `basic-user-name` + +- Pulsar uses the basic account and password to authenticate the identity. Specify the account. + +#### `basic-password` + +- Pulsar uses the basic account and password to authenticate the identity. Specify the password. + +#### `auth-tls-certificate-path` + +- Specifies the certificate path for Pulsar TLS encrypted authentication. + +#### `auth-tls-private-key-path` + +- Specifies the private key path for Pulsar TLS encrypted authentication. + +#### `tls-trust-certs-file-path` + +- Specifies the path to trusted certificate file of the Pulsar TLS encrypted authentication. + +#### `oauth2.oauth2-issuer-url` + +- Pulsar oauth2 issuer-url. +- For more information, see [Pulsar documentation website](https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication). + +#### `oauth2.oauth2-audience` + +- Pulsar oauth2 audience. +- For more information, see the [Pulsar website](https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication). + +#### `oauth2.oauth2-private-key` + +- Pulsar oauth2 private-key. +- For more information, see the [Pulsar website](https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication). + +#### `oauth2.oauth2-client-id` + +- Pulsar oauth2 client-id +- For more information, see the [Pulsar website](https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication). + +#### `oauth2.oauth2-scope` + +- Pulsar oauth2 oauth2-scope. +- For more information, see the [Pulsar website](https://pulsar.apache.org/docs/2.10.x/client-libraries-go/#tls-encryption-and-authentication). + +#### `pulsar-producer-cache-size` + +- Specifies the number of cached Pulsar producers in TiCDC. Each Pulsar producer corresponds to one topic. If the number of topics you need to replicate is larger than the default value, you need to increase the number. +- Default value: `10240` + +#### `compression-type` + +- Pulsar data compression method. +- Default value: `""`, which means no compression is used +- Value options: `"lz4"`, `"zlib"`, `"zstd"` + +#### `connection-timeout` + +- The timeout for the Pulsar client to establish a TCP connection with the server. +- Default value: `5` (seconds) + +#### `operation-timeout` + +- The timeout for Pulsar clients to initiate operations such as creating and subscribing to a topic. +- Default value: `30` (seconds) + +#### `batching-max-messages` + +- The maximum number of messages in a single batch for a Pulsar producer to send. +- Default value: `1000` + +#### `batching-max-publish-delay` + +- The interval at which Pulsar producer messages are saved for batching. +- Default value: `10` (milliseconds) + +#### `send-timeout` + +- The timeout for a Pulsar producer to send a message. +- Default value: `30` (seconds) + +#### `output-raw-change-event` + +- Controls whether to output the original data change event. For more information, see [Control whether to split primary or unique key `UPDATE` events](/ticdc/ticdc-split-update-behavior.md#control-whether-to-split-primary-or-unique-key-update-events). +- Default value: `false` + +### sink.cloud-storage-config + +#### `worker-count` + +- The concurrency for saving data changes to the downstream cloud storage. +- Default value: `16` + +#### `flush-interval` + +- The interval for saving data changes to the downstream cloud storage. +- Default value: `"2s"` + +#### `file-size` + +- A data change file is saved to the cloud storage when the number of bytes in this file exceeds `file-size`. +- Default value: `67108864`, that is 64 MiB + +#### `file-expiration-days` + +- The duration to retain files, which takes effect only when `date-separator` is configured as `day`. +- Default value: `0`, which means file cleanup is disabled +- Assume that `file-expiration-days = 1` and `file-cleanup-cron-spec = "0 0 0 * * *"`, then TiCDC performs daily cleanup at 00:00:00 for files saved beyond 24 hours. For example, at 00:00:00 on 2023/12/02, TiCDC cleans up files generated before 2023/12/01, while files generated on 2023/12/01 remain unaffected. + +#### `file-cleanup-cron-spec` + +- The running cycle of the scheduled cleanup task, compatible with the crontab configuration. +- The format is ` ` +- Default value: `"0 0 2 * * *"`, which means that the cleanup task is executed every day at 2 AM + +#### `flush-concurrency` + +- The concurrency for uploading a single file. +- Default value: `1`, which means concurrency is disabled + +#### `output-raw-change-event` + +- Controls whether to output the original data change event. For more information, see [Control whether to split primary or unique key `UPDATE` events](/ticdc/ticdc-split-update-behavior.md#control-whether-to-split-primary-or-unique-key-update-events). +- Default value: `false` \ No newline at end of file diff --git a/ticdc/ticdc-server-config.md b/ticdc/ticdc-server-config.md index 23b7506014af6..b21dc707eef8b 100644 --- a/ticdc/ticdc-server-config.md +++ b/ticdc/ticdc-server-config.md @@ -28,63 +28,151 @@ The following are descriptions of options available in a `cdc server` command: ## `cdc server` configuration file parameters -The following describes the configuration file specified by the `config` option in the `cdc server` command: - -```toml -# The configuration method of the following parameters is the same as that of CLI parameters, but the CLI parameters have higher priorities. -addr = "127.0.0.1:8300" -advertise-addr = "" -log-file = "" -log-level = "info" -data-dir = "" -gc-ttl = 86400 # 24 h -tz = "System" -cluster-id = "default" -# This parameter specifies the maximum memory threshold (in bytes) for tuning GOGC: Setting a smaller threshold increases the GC frequency. Setting a larger threshold reduces GC frequency and consumes more memory resources for the TiCDC process. Once the memory usage exceeds this threshold, GOGC Tuner stops working. The default value is 0, indicating that GOGC Tuner is disabled. -gc-tuner-memory-threshold = 0 - -[security] - ca-path = "" - cert-path = "" - key-path = "" - # This parameter controls whether to enable the TLS client authentication. The default value is false. - mtls = false - # This parameter controls whether to use username and password for client authentication. The default value is false. - client-user-required = false - # This parameter lists the usernames that are allowed for client authentication. Authentication requests with usernames not in this list will be rejected. The default value is null. - client-allowed-user = ["username_1", "username_2"] - -# The session duration between TiCDC and etcd services, measured in seconds. This parameter is optional and its default value is 10. -capture-session-ttl = 10 # 10s - -# The interval at which the Owner module in the TiCDC cluster attempts to push the replication progress. This parameter is optional and its default value is `50000000` nanoseconds (that is, 50 milliseconds). You can configure this parameter in two ways: specifying only the number (for example, configuring it as `40000000` represents 40000000 nanoseconds, which is 40 milliseconds), or specifying both the number and unit (for example, directly configuring it as `40ms`). -owner-flush-interval = 50000000 # 50 ms - -# The interval at which the Processor module in the TiCDC cluster attempts to push the replication progress. This parameter is optional and its default value is `50000000` nanoseconds (that is, 50 milliseconds). The configuration method of this parameter is the same as that of `owner-flush-interval`. -processor-flush-interval = 50000000 # 50 ms - -# [log] -# # The output location for internal error logs of the zap log module. This parameter is optional and its default value is "stderr". -# error-output = "stderr" -# [log.file] -# # The maximum size of a single log file, measured in MiB. This parameter is optional and its default value is 300. -# max-size = 300 # 300 MiB -# # The maximum number of days to retain log files. This parameter is optional and its default value is `0`, indicating never to delete. -# max-days = 0 -# # The number of log files to retain. This parameter is optional and its default value is `0`, indicating to keep all log files. -# max-backups = 0 - -#[sorter] -# The size of the shared pebble block cache in the Sorter module for the 8 pebble DBs started by default, measured in MiB. The default value is 128. -# cache-size-in-mb = 128 -# The directory where sorter files are stored relative to the data directory (`data-dir`). This parameter is optional and its default value is "/tmp/sorter". -# sorter-dir = "/tmp/sorter" - -# [kv-client] -# The number of threads that can be used in a single Region worker. This parameter is optional and its default value is 8. -# worker-concurrent = 8 -# The number of threads in the shared thread pool of TiCDC, mainly used for processing KV events. This parameter is optional and its default value is 0, indicating that the default pool size is twice the number of CPU cores. -# worker-pool-size = 0 -# The retry duration of Region connections. This parameter is optional and its default value is `60000000000` nanoseconds (that is, 1 minute). You can configure this parameter in two ways: specifying only the number (for example, configuring it as `50000000` represents 50000000 nanoseconds, which is 50 milliseconds), or specifying both the number and unit (for example, directly configuring it as `50ms`). -# region-retry-duration = 60000000000 -``` +The following describes the configuration file specified by the `config` option in the `cdc server` command. You can find the default configuration file in [`pkg/cmd/util/ticdc.toml`](https://github.com/pingcap/tiflow/blob/master/pkg/cmd/util/ticdc.toml). + + + +### `addr` + +- Example: `"127.0.0.1:8300"` + +### `advertise-addr` + +- Example: `""` + +### `log-file` + +- Example: `""` + +### `log-level` + +- Example: `"info"` + +### `data-dir` + +- Example: `""` + +### `gc-ttl` + +- Example: `86400` (24h) + +### `tz` + +- Example: `"System"` + +### `cluster-id` + +- Example: `"default"` + +### `gc-tuner-memory-threshold` + +- Specifies the maximum memory threshold for tuning GOGC. Setting a smaller threshold increases the GC frequency. Setting a larger threshold reduces GC frequency and consumes more memory resources for the TiCDC process. Once the memory usage exceeds this threshold, GOGC Tuner stops working. +- Default value: `0`, indicating that GOGC Tuner is disabled +- Unit: Bytes + +### security + +#### `ca-path` + +- Example: `""` + +#### `cert-path` + +- Example: `""` + +#### `key-path` + +- Example: `""` + +#### `mtls` + +- Controls whether to enable the TLS client authentication. +- Default value: `false` + +#### `client-user-required` + +- Controls whether to use username and password for client authentication. The default value is false. +- Default value: `false` + +#### `client-allowed-user` + +- Lists the usernames that are allowed for client authentication. Authentication requests with usernames not in this list will be rejected. +- Default value: `null` + + + +### `capture-session-ttl` + +- Specifies the session duration between TiCDC and etcd services. This parameter is optional. +- Default value: `10` +- Unit: Seconds + +### `owner-flush-interval` + +- Specifies the interval at which the Owner module in the TiCDC cluster attempts to push the replication progress. This parameter is optional and its default value is `50000000` nanoseconds (that is, 50 milliseconds). +- You can configure this parameter in two ways: specifying only the number (for example, configuring it as `40000000` represents 40000000 nanoseconds, which is 40 milliseconds), or specifying both the number and unit (for example, directly configuring it as `40ms`). +- Default value: `50000000`, that is, 50 milliseconds + +### `processor-flush-interval` + +- Specifies the interval at which the Processor module in the TiCDC cluster attempts to push the replication progress. This parameter is optional and its default value is `50000000` nanoseconds (that is, 50 milliseconds). +- The configuration method of this parameter is the same as that of `owner-flush-interval`. +- Default value: `50000000`, that is, 50 milliseconds + +### log + +#### `error-output` + +- Specifies the output location for internal error logs of the zap log module. This parameter is optional. +- Default value: `"stderr"` + +#### log.file + +##### `max-size` + +- Specifies the maximum size of a single log file. This parameter is optional. +- Default value: `300` +- Unit: MiB + +##### `max-days` + +- Specifies the maximum number of days to retain log files. This parameter is optional. +- Default value: `0`, indicating never to delete + +##### `max-backups` + +- Specifies the number of log files to retain. This parameter is optional. +- Default value: `0`, indicating to keep all log files + +### sorter + +#### `cache-size-in-mb` + +- Specifies the size of the shared pebble block cache in the Sorter module for the 8 pebble DBs started by default. +- Default value: `128` +- Unit: MiB + +#### `sorter-dir` + +- Specifies the directory where sorter files are stored relative to the data directory (`data-dir`). This parameter is optional. +- Default value: `"/tmp/sorter"` + +### kv-client + +#### `worker-concurrent` + +- Specifies the number of threads that can be used in a single Region worker. This parameter is optional. +- Default value: `8` + +#### `worker-pool-size` + +- Specifies the number of threads in the shared thread pool of TiCDC, mainly used for processing KV events. This parameter is optional. +- Default value: `0`, indicating that the default pool size is twice the number of CPU cores + +#### `region-retry-duration` + +- Specifies the retry duration of Region connections. This parameter is optional. +- You can configure this parameter in two ways: + - Specify only the number, for example, `50000000` represents 50000000 nanoseconds (50 milliseconds) + - Specify both the number and the unit, for example, `50ms` +- Default value: `60000000000` (1 minute)