-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add index status is always running after inject pd leader io delay 500ms lasts for 2mins #50435
Labels
affects-7.5
affects-7.6
component/ddl
This issue is related to DDL of TiDB.
severity/major
type/bug
The issue is confirmed as a bug.
Comments
/severity major |
ti-chi-bot
bot
added
severity/major
may-affects-5.4
This bug maybe affects 5.4.x versions.
may-affects-6.1
may-affects-6.5
may-affects-7.1
may-affects-7.5
labels
Jan 15, 2024
backend ctx wrongly closed when executor_manager met error, then planner can't get backend ctx for the next step |
13 tasks
tangenta
added
affects-7.5
and removed
may-affects-5.4
This bug maybe affects 5.4.x versions.
may-affects-6.1
may-affects-6.5
may-affects-7.1
may-affects-7.5
labels
Jan 16, 2024
13 tasks
/component ddl |
13 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects-7.5
affects-7.6
component/ddl
This issue is related to DDL of TiDB.
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、tidb_enable_dist_task='on' and enable global sort
2、add index
3、inject pd leader io delay 500ms lasts for 2mins
[2024/01/12 17:19:09.227 +08:00] [INFO] [add_index.go:78] ["will add index"] [db=sysbench_64_7000w] [table=sbtest1] [addIndexColumnName=c] [addIndexName=index_test_1705051149227]
[2024/01/12 17:19:09.227 +08:00] [INFO] [util.go:74] ["alter table sbtest1 add index index_test_1705051149227 (c)"]
[2024/01/12 17:20:15.057 +08:00] [INFO] [chaos.go:65] ["Run chaos"]
[2024/01/12 17:22:15.074 +08:00] [INFO] [chaos.go:94] ["Clean chaos"]
[2024/01/12 17:27:15.234 +08:00] [INFO] [chaos.go:82] ["Run chaos success"]
[2024/01/12 17:29:15.234 +08:00] [INFO] [chaos.go:94] ["Clean chaos"]
[2024/01/12 17:34:15.459 +08:00] [INFO] [chaos.go:82] ["Run chaos success"]
[2024/01/12 17:36:15.460 +08:00] [INFO] [chaos.go:94] ["Clean chaos"]
[2024/01/12 17:41:15.604 +08:00] [INFO] [chaos.go:65] ["Run chaos"]
[2024/01/12 17:43:14.778 +08:00] [INFO] [chaos.go:94] ["Clean chaos"]
2. What did you expect to see? (Required)
add index can success
3. What did you see instead (Required)
add index status is always running after inject pd leader io delay 500ms lasts for 2mins
the status of ddl job is not synced after 1h0m0s (now: 2024-01-12 18:19:18, jobId: 546, job type: add index /* ingest cloud /, state: running)
operatorLogs:
[2024-01-12 17:19:09] ###### start adding index
alter table sbtest1 add index index_test_1705051149227 (c)
[2024-01-12 17:19:09] ###### wait for ddl job finish
[2024-01-12 18:19:18] ###### wait for ddl job finish timeout(1h0m0s)
select job_id, job_type, state from information_schema.ddl_jobs where query = 'alter table sbtest1 add index index_test_1705051149227 (c)'
jobId: 546, job type: add index / ingest cloud */, state: running
tidb logs:
[2024/01/12 17:20:56.877 +08:00] [ERROR] [txn.go:602] ["active transaction fail"] [error="[tikv:9001]PD server timeout: "]
[2024/01/12 17:20:56.877 +08:00] [INFO] [tidb.go:286] ["rollbackTxn called due to ddl/autocommit failure"]
[2024/01/12 17:20:56.877 +08:00] [WARN] [session.go:2251] ["run statement failed"] [conn=2940214166] [session_alias=] [schemaVersion=712] [error="[tikv:9001]PD server timeout: "] [session="{\n "currDBName": "sysbench_64_7000w",\n "id": 2940214166,\n "preparedStmtCount": 578,\n "status": 2,\n "strictMode": true,\n "user": {\n "Username": "root",\n "Hostname": "10.200.75.94",\n "CurrentUser": false,\n "AuthUsername": "root",\n "AuthHostname": "%",\n "AuthPlugin": "mysql_native_password"\n }\n}"]
[2024/01/12 17:20:57.186 +08:00] [ERROR] [manager.go:442] ["task manager met error"] [error="[tikv:9001]PD server timeout: "] [stack="github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*Manager).logErr\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor/manager.go:442\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*Manager).handleExecutableTask\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor/manager.go:391\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(*Manager).handleExecutableTasks.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor/manager.go:257\ngithub.com/pingcap/tidb/pkg/resourcemanager/pool/spool.(*Pool).run.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/resourcemanager/pool/spool/spool.go:144"]
[2024/01/12 17:20:57.186 +08:00] [INFO] [pd_service_discovery.go:871]
[2024/01/12 17:20:57.187 +08:00] [INFO] [backend_mgr.go:190] ["close one backend for DDL job"] ["job ID"=546] ["current memory usage"=0] ["max memory quota"=8589934592]
[2024/01/12 17:20:57.288 +08:00] [ERROR] [pd_service_discovery.go:534] ["[pd] failed to update member"] [urls="[http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-5520324-1-724.svc:2379,http://tc-pd-1.tc-pd-peer.endless-ha-test-add-index-tps-5520324-1-724.svc:2379,http://tc-pd-2.tc-pd-peer.endless-ha-test-add-index-tps-5520324-1-724.svc:2379]"] [error="[PD:client:ErrClientGetMember]get member failed"]
[2024/01/12 17:20:57.288 +08:00] [ERROR] [tso_dispatcher.go:498] ["[tso] getTS error after processing requests"] [dc-location=global] [stream-addr=http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-5520324-1-724.svc:2379/] [error="[PD:client:ErrClientGetTSO]get TSO failed, %v: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"]
[2024/01/12 17:20:57.288 +08:00] [WARN] [backoff.go:179] ["pdRPC backoffer.maxSleep 30000ms is exceeded, errors:\nget timestamp failed: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster at 2024-01-12T17:20:50.69242944+08:00\nget timestamp failed: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster at 2024-01-12T17:20:52.446763495+08:00\nget timestamp failed: rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster at 2024-01-12T17:20:55.023437291+08:00\ntotal-backoff-times: 15, backoff-detail: pdRPC:15, maxBackoffTimeExceeded: true, maxExcludedTimeExceeded: false\nlongest sleep type: pdRPC, time: 31737ms"]
[2024/01/12 17:20:57.288 +08:00] [ERROR] [txn.go:602] ["active transaction fail"] [error="[tikv:9001]PD server timeout: "]
[2024/01/12 17:20:57.288 +08:00] [INFO] [tidb.go:286] ["rollbackTxn called due to ddl/autocommit failure"]
[2024/01/12 17:20:57.289 +08:00] [INFO] [pd_service_discovery.go:871] ["[pd] cannot update member from this address"] [address=http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-5520324-1-724.svc:2379/] [error="[PD:client:ErrClientGetLeader]get leader failed, leader address doesn't exist"]
[2024/01/12 17:20:57.289 +08:00] [WARN] [session.go:2251] ["run statement failed"] [schemaVersion=712] [error="[tikv:9001]PD server timeout: "] [session="{\n "currDBName": "",\n "id": 0,\n "status": 2,\n "strictMode": true,\n "user": null\n}"]
[2024/01/12 17:20:57.289 +08:00] [WARN] [nodes.go:134] ["get managed nodes met error"] [error="[tikv:9001]PD server timeout: "]
tidb full logs (testbed: endless-ha-test-add-index-tps-5520324-1-724):
tidb-0.tar.gz
tidb-1-2024-01-12T20-47-25.tar.gz
4. What is your TiDB version? (Required)
./tidb-server -V
Release Version: v7.6.0-alpha
Edition: Community
Git Commit Hash: 1ad36eb
Git Branch: heads/refs/tags/v7.6.0-alpha
UTC Build Time: 2024-01-11 12:34:00
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-01-12T17:12:07.701+0800
The text was updated successfully, but these errors were encountered: