Skip to content

Latest commit

 

History

History
201 lines (179 loc) · 13.9 KB

tidb2kafka2pg.org

File metadata and controls

201 lines (179 loc) · 13.9 KB

Architecture

./png/tidb2kafka2pg/tidb2kafka2pg.png

Deployment

Please refer to the Config File for your reference. It containes the below resources definition.

  • TiDB Cluster
  • Kafka cluster
  • Postgres DB

./png/tidb2kafka2pg/01.01.png ./png/tidb2kafka2pg/01.02.png

Deployment

OhMyTiUP$./bin/aws tidb2kafka2pg 
Run commands for syncing the data to pg from tidb through kafka

Usage:
  aws [command]

Available Commands:
  deploy       Deploy an Kafka Cluster on aws
  list         List all clusters or cluster of aurora db
  destroy      Destroy a specified cluster
  perf-prepare perf performance test preparation
  perf         perf performance test
  perf-clean   clean perf performance test

Flags:
  -h, --help   help for tidb2kafka2pg

Global Flags:
      --aws-access-key-id string       The access key used to operate against AWS resource
      --aws-region string              The default aws region 
      --aws-secret-access-key string   The secret access key used to operate against AWS resource
  -c, --concurrency int                max number of parallel tasks allowed (default 5)
      --identity-file string           The identity file for natijve ssh
      --ssh string                     (EXPERIMENTAL) The executor type: 'builtin', 'system', 'none'.
      --ssh-timeout uint               Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
      --ssh-user string                The user for native ssh
      --wait-timeout uint              Timeout in seconds to wait for an operation to complete, ignored for operations that don't fit. (default 120)
  -y, --yes                            Skip all confirmations and assumes 'yes'

Use "aws help [command]" for more information about a command.

OhMyTiUP$./bin/aws tidb2kafka2pg deploy avrotest embed/examples/aws/aws-nodes-tidb2kafka2pg.yaml
... ...
Please refer to below screenshot for output reference

./png/tidb2kafka2pg/02.01.png ./png/tidb2kafka2pg/02.02.png

Resource confirmation

OhMyTiUP$./bin/aws tidb2kafka2pg list avrotest

./png/tidb2kafka2pg/03.01.png ./png/tidb2kafka2pg/03.02.png ./png/tidb2kafka2pg/03.03.png

Prepare kafka sync resource and tables

OhMyTiUP$./bin/aws tidb2kafka2pg perf prepare avrotest --data-type INT --partitions 3 --num-of-records 100000 --bytes-of-record 1024 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem
ParameterComment
partitionsNumber of partitions definition in the kafka which push data into multiple partitions for parallel test
num-of-recordsNumber of records to be inserted into TiDB for data sync test
bytes-of-recordNumber of bytes for each record
ssh-userThe user to access to the workstation
identity-fileThe ssh priviate key

./png/tidb2kafka2pg/04.png

Run the data sync

Run the test to insert the data into TiDB to check the whole data flow.

Column NameComment
CountThe number of records inserted into TiDB
DB QPSQPS to insert into TiDB
TiDB 2 PG LatencyAverage endpoint to endpoint latency
QPS from TiDB to PGThe QPS to sync the data from TiDB to PG

./png/tidb2kafka2pg/05.png

Clean up the resources

./png/tidb2kafka2pg/06.01.png

Todo

Data type sync verification

TiDB mapping to Postgres: https://www.convert-in.com/mysql-to-postgres-types-mapping.htm

TiDB Data TypeOK/NGPG Data TypeComment
BOOLOKBooleanNeed to add [Cast] transformation config
TINYINTOKSmallInt
SMALLINTOKINT
MEDIUMINTOKINT
INTOKINT
BIGINTOKBIGINT
BIGINT UNSIGNEDOKBIGINT
TINYBLOBNGBYTEASucceeded to sync. But from TiDB the value is [This is the test] while postgres keep the data like [\x54686973206973207468652074657374]
BLOBNGBYTEASame problem as TINYBLOB
MEDIUMBLOBNGBYTEASame problem as TINYBLOB
LONGBLOBNGBYTEASame problem as TINYBLOB
BINARYNGBYTEAMySQL: “t” Postgres: “\x74”
VARBINARY(256)NGBYTEAMySQL: “This is the barbinary message” Postgres: “\x54686973206973207468652062617262696e617279206d657373616765”
TINYTEXTOKTEXT
TEXTOKTEXT
MEDIUMTEXTOKTEXT
LONGTEXTOKTEXT
CHAROKchar
VARCHAR(255)OKVARCHAR(255)
FLOATNGNEMERIC[10.123457] <-> [10.1234569549561]
DOUBLEOKNUMERIC
DATETIMEOKTIMESTAMPNeed transformation from TiDB DATETIME to PG timestamp
DATEOKDATE
TIMESTAMPOKTIMESTAMPNeed to add [timestamp] transformation
TIMEOKTIME
YEAROKSMALLINT
BITNGTodo. No idea how to sync so far
JSONOKJSON
ENUM(‘a’, ‘b’, ‘c’)OKvarchar
OKENUM
SET(‘a’, ‘b’, ‘c’)OKENUMpg: CREATE TYPE TEST_SET AS ENUM (‘a’, ‘b’, ‘c’); CREATE TABLE TEST02(col01 int primary key, col02 TEST_SET default null)
DECIMALOKDECIMAL

Please refer to BIT TRANSFORMATION

SET

The data extracted from TiDB is the format like <value01,value02> while the enum[] requires the format like <{value01,value02}>

{
  "subject": "test_test01-value",
  "version": 1,
  "id": 1,
  "schema": "{\"type\":\"record\",\"name\":\"test01\",\"namespace\":\"default.test\",\"fields\":[{\"name\":\"pk_col\",\"type\":{\"type\":\"long\",\"connect.parameters\":{\"tidb_type\":\"BIGINT\"}}},{\"name\":\"t_set\",\"type\":[\"null\",{\"type\":\"string\",\"connect.parameters\":{\"allowed\":\"a,b,c\",\"tidb_type\":\"SET\"}}],\"default\":null},{\"name\":\"tidb_timestamp\",\"type\":[{\"type\":\"string\",\"connect.parameters\":{\"tidb_type\":\"TIMESTAMP\"}},\"null\"],\"default\":\"CURRENT_TIMESTAMP\"}]}"
}

Summary

Issues

  • The new version seeems not to support the TiCDC command. The pd configuration is decommissioned while the server(cdc) is supported. Please take a look of the documentation. https://docs.pingcap.com/tidb/dev/manage-ticdc
      cdc cli changefeed list [flags]
    
    Flags:
      -a, --all    List all replication tasks(including removed and finished)
      -h, --help   help for list
    
    Global Flags:
          --ca string          CA certificate path for TLS connection to CDC server
          --cert string        Certificate path for TLS connection to CDC server
      -i, --interact           Run cdc cli with readline
          --key string         Private key path for TLS connection to CDC server
          --log-level string   log level (etc: debug|info|warn|error) (default "warn")
          --pd string          PD address, use ',' to separate multiple PDs, Parameter --pd is deprecated, please use parameter --server instead.
          --server string      CDC server address
    
        

How to run the test case

Clean the resources

OhMyTiUP$ ./bin/aws tidb2kafka2pg perf clean avrotest --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem

Prepare the resources

OhMyTiUP$ ./bin/aws tidb2kafka2pg perf prepare avrotest --data-type TINYBLOB --partitions 3 --num-of-records 100000 --bytes-of-record 1024 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem

Run the sysbench

OhMyTiUP$ ./bin/aws tidb2kafka2pg perf run avrotest --num-of-records 1000 --ssh-user admin --identity-file /home/pi/.ssh/jay-us-east-01.pem
Count  DB QPS  TiDB 2 PG Latency  TiDB 2 PG QPS
-----  ------  -----------------  -------------
1000   500     12                 78

Todo

aws init

.ohmytiup/config

aws:
  public_ssh_key: The aws public key name
  private_ssh_key: The local private ssh key
  os_username: The default user name of the os

Implementation

  • Open the default config file and read the data
  • Show prompt to confim the value
  • Write the data back to the config file

Added all the default data

Config file -> Show all config -> start process Default config value -> Show all config -> process Decide the default value according to the regions Go through all prompts -> Show all config -> process