Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected slow query during GC running after stop 1 tikv-server #898

Closed
crazycs520 opened this issue Jul 20, 2023 · 1 comment
Closed

Comments

@crazycs520
Copy link
Contributor

crazycs520 commented Jul 20, 2023

reproduce

TiDB cluster is cross-zone deployed, which version is v6.5.3, the topology is following:

instance zone
pd-1 az1
pd-2 az2
pd-3 az3
tidb-1 az1
tidb-2 az2
tidb-3 az3
tikv-1a az1
tikv-1b az1
tikv-2a az2
tikv-2b az2
tikv-3a az3
tikv-ab az3
  1. run the following stale-read load in background:
sysbench --config-file=sysbench.conf mussel_minimal.lua --threads=20 --read_staleness=-5 run
  1. Stop 1 tikv which zone is same with TiDB GC leader. Suppose tidb-1 is TiDB GC leader, then stop tikv-1a instance:
tiup cluster stop cs-az3 -N ${tikv-1a-address} --evict-leaders -y
  1. Wait for the next GC to run and observe the QPS and slow queries of the TiDB GC leader instance, you will see metrics like following:
image

And you can also see some slow querie plan like following, attention the information: cop_task: {num: 1, max: 10.1s, proc_keys: 16, rpc_num: 2:

id                              task            estRows operator info                                                       actRows execution info                                                                                                                                                                                                                                  memory  disk
Sort_10                         root            1.18    sbtest.t1.sk                                                        16      time:10.1s, loops:2                                                                                                                                                                                                                             6.98 KB 0 Bytes
└─Selection_13                  root            1.18    le(Column#6, 5)                                                     16      time:10.1s, loops:2                                                                                                                                                                                                                             6.73 KB N/A
  └─Shuffle_18                  root            1.47    execution info: concurrency:1, data sources:[TableReader_16]        16      time:10.1s, loops:3, ShuffleConcurrency:1                                                                                                                                                                                                       N/A     N/A
    └─Window_14                 root            1.47    row_number()->Column#6 over(partition by ...)                       16      time:10.1s, loops:2                                                                                                                                                                                                                             N/A     N/A
      └─Sort_17                 root            58.96   sbtest.t1.pk, sbtest.t1.sk, sbtest.t1.ts:desc                       16      time:10.1s, loops:2                                                                                                                                                                                                                             6.61 KB 0 Bytes
        └─ShuffleReceiver_21    root            58.96                                                                       16      time:10.1s, loops:2                                                                                                                                                                                                                             N/A     N/A
          └─TableReader_16      root            58.96   data:TableRangeScan_15                                              16      time:10.1s, loops:2, cop_task: {num: 1, max: 10.1s, proc_keys: 16, rpc_num: 2, rpc_time: 10s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, backoff{tikvRPC: 65ms}                                                                      5.69 KB N/A
            └─TableRangeScan_15 cop[tikv]       58.96   table:t1, range:[0x30...), keep order:false                         16      tikv_task:{time:1ms, loops:1}, scan_detail: {total_process_keys: 16, total_process_keys_size: 5937, total_keys: 17, get_snapshot_time: 9.35µs, rocksdb: {delete_skipped_count: 3, key_skipped_count: 33, block: {cache_hit_count: 10}}} N/A     N/A
@crazycs520
Copy link
Contributor Author

Fixed in #909

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant