DDL progress can be blocked due to high concurrency #30400
Labels
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
affects-5.2
This bug affects 5.2.x versions.
affects-5.3
This bug affects 5.3.x versions.
severity/major
sig/sql-infra
SIG: SQL Infra
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
The original scenario is using sysbench to create 10000 tables through a load balancer. In a cluster with more than 10 TiDB instances, it is super easy to reproduce.
When the DDL job ID allocating transaction keeps rolling back because of the write conflict(like more than 100 times), an error is sent back from another goroutine. However, this error is not properly handled. To reproduce it locally, we need to inject a failpoint:
make failpoint-enable make GO_FAILPOINTS="github.com/pingcap/tidb/ddl/mockAddBatchDDLJobsErr=return(true)" ./bin/tidb-server
The connection is leaking.
Fortunately, this does not affect the DDL/DML from another session.
2. What did you expect to see? (Required)
Query OK, 1 row affected (0.06 sec)
3. What did you see instead (Required)
It hangs constantly.
4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: