Please refer to the Config File for your reference. It containes the below resources definition.
- TiDB Cluster
- Kafka cluster
- Postgres DB
OhMyTiUP$./bin/aws tidb2kafka2pg Run commands for syncing the data to pg from tidb through kafka Usage: aws [command] Available Commands: deploy Deploy an Kafka Cluster on aws list List all clusters or cluster of aurora db destroy Destroy a specified cluster perf-prepare perf performance test preparation perf perf performance test perf-clean clean perf performance test Flags: -h, --help help for tidb2kafka2pg Global Flags: --aws-access-key-id string The access key used to operate against AWS resource --aws-region string The default aws region --aws-secret-access-key string The secret access key used to operate against AWS resource -c, --concurrency int max number of parallel tasks allowed (default 5) --identity-file string The identity file for natijve ssh --ssh string (EXPERIMENTAL) The executor type: 'builtin', 'system', 'none'. --ssh-timeout uint Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5) --ssh-user string The user for native ssh --wait-timeout uint Timeout in seconds to wait for an operation to complete, ignored for operations that don't fit. (default 120) -y, --yes Skip all confirmations and assumes 'yes' Use "aws help [command]" for more information about a command. OhMyTiUP$./bin/aws tidb2kafka2pg deploy avrotest embed/examples/aws/aws-nodes-tidb2kafka2pg.yaml ... ... Please refer to below screenshot for output reference
OhMyTiUP$./bin/aws tidb2kafka2pg list avrotest
OhMyTiUP$./bin/aws tidb2kafka2pg perf prepare avrotest --data-type INT --partitions 3 --num-of-records 100000 --bytes-of-record 1024 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem
Parameter | Comment |
---|---|
partitions | Number of partitions definition in the kafka which push data into multiple partitions for parallel test |
num-of-records | Number of records to be inserted into TiDB for data sync test |
bytes-of-record | Number of bytes for each record |
ssh-user | The user to access to the workstation |
identity-file | The ssh priviate key |
Run the test to insert the data into TiDB to check the whole data flow.
Column Name | Comment |
---|---|
Count | The number of records inserted into TiDB |
DB QPS | QPS to insert into TiDB |
TiDB 2 PG Latency | Average endpoint to endpoint latency |
QPS from TiDB to PG | The QPS to sync the data from TiDB to PG |
TiDB mapping to Postgres: https://www.convert-in.com/mysql-to-postgres-types-mapping.htm
TiDB Data Type | OK/NG | PG Data Type | Comment |
---|---|---|---|
BOOL | OK | Boolean | Need to add [Cast] transformation config |
TINYINT | OK | SmallInt | |
SMALLINT | OK | INT | |
MEDIUMINT | OK | INT | |
INT | OK | INT | |
BIGINT | OK | BIGINT | |
BIGINT UNSIGNED | OK | BIGINT | |
TINYBLOB | NG | BYTEA | Succeeded to sync. But from TiDB the value is [This is the test] while postgres keep the data like [\x54686973206973207468652074657374] |
BLOB | NG | BYTEA | Same problem as TINYBLOB |
MEDIUMBLOB | NG | BYTEA | Same problem as TINYBLOB |
LONGBLOB | NG | BYTEA | Same problem as TINYBLOB |
BINARY | NG | BYTEA | MySQL: “t” Postgres: “\x74” |
VARBINARY(256) | NG | BYTEA | MySQL: “This is the barbinary message” Postgres: “\x54686973206973207468652062617262696e617279206d657373616765” |
TINYTEXT | OK | TEXT | |
TEXT | OK | TEXT | |
MEDIUMTEXT | OK | TEXT | |
LONGTEXT | OK | TEXT | |
CHAR | OK | char | |
VARCHAR(255) | OK | VARCHAR(255) | |
FLOAT | NG | NEMERIC | [10.123457] <-> [10.1234569549561] |
DOUBLE | OK | NUMERIC | |
DATETIME | OK | TIMESTAMP | Need transformation from TiDB DATETIME to PG timestamp |
DATE | OK | DATE | |
TIMESTAMP | OK | TIMESTAMP | Need to add [timestamp] transformation |
TIME | OK | TIME | |
YEAR | OK | SMALLINT | |
BIT | NG | Todo. No idea how to sync so far | |
JSON | OK | JSON | |
ENUM(‘a’, ‘b’, ‘c’) | OK | varchar | |
OK | ENUM | ||
SET(‘a’, ‘b’, ‘c’) | OK | ENUM | pg: CREATE TYPE TEST_SET AS ENUM (‘a’, ‘b’, ‘c’); CREATE TABLE TEST02(col01 int primary key, col02 TEST_SET default null) |
DECIMAL | OK | DECIMAL |
Please refer to BIT TRANSFORMATION
The data extracted from TiDB is the format like <value01,value02> while the enum[] requires the format like <{value01,value02}>
{ "subject": "test_test01-value", "version": 1, "id": 1, "schema": "{\"type\":\"record\",\"name\":\"test01\",\"namespace\":\"default.test\",\"fields\":[{\"name\":\"pk_col\",\"type\":{\"type\":\"long\",\"connect.parameters\":{\"tidb_type\":\"BIGINT\"}}},{\"name\":\"t_set\",\"type\":[\"null\",{\"type\":\"string\",\"connect.parameters\":{\"allowed\":\"a,b,c\",\"tidb_type\":\"SET\"}}],\"default\":null},{\"name\":\"tidb_timestamp\",\"type\":[{\"type\":\"string\",\"connect.parameters\":{\"tidb_type\":\"TIMESTAMP\"}},\"null\"],\"default\":\"CURRENT_TIMESTAMP\"}]}" }
- The new version seeems not to support the TiCDC command. The pd configuration is decommissioned while the server(cdc) is supported.
Please take a look of the documentation. https://docs.pingcap.com/tidb/dev/manage-ticdc
cdc cli changefeed list [flags] Flags: -a, --all List all replication tasks(including removed and finished) -h, --help help for list Global Flags: --ca string CA certificate path for TLS connection to CDC server --cert string Certificate path for TLS connection to CDC server -i, --interact Run cdc cli with readline --key string Private key path for TLS connection to CDC server --log-level string log level (etc: debug|info|warn|error) (default "warn") --pd string PD address, use ',' to separate multiple PDs, Parameter --pd is deprecated, please use parameter --server instead. --server string CDC server address
OhMyTiUP$ ./bin/aws tidb2kafka2pg perf clean avrotest --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem
OhMyTiUP$ ./bin/aws tidb2kafka2pg perf prepare avrotest --data-type TINYBLOB --partitions 3 --num-of-records 100000 --bytes-of-record 1024 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem
OhMyTiUP$ ./bin/aws tidb2kafka2pg perf run avrotest --num-of-records 1000 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem Count DB QPS TiDB 2 PG Latency TiDB 2 PG QPS ----- ------ ----------------- ------------- 1000 500 12 78
.ohmytiup/config
aws: public_ssh_key: The aws public key name private_ssh_key: The local private ssh key os_username: The default user name of the os
- Open the default config file and read the data
- Show prompt to confim the value
- Write the data back to the config file
Config file -> Show all config -> start process Default config value -> Show all config -> process Decide the default value according to the regions Go through all prompts -> Show all config -> process