truncate table with many partitions with tiflash replica may encounter write conflict and retry #42940

lcwangchao · 2023-04-11T12:40:30Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create a table with many partitions
set the table with tiflash replica
truncate table frequency

2. What did you expect to see? (Required)

3. What did you see instead (Required)

Some times you can see some error in log

[2023/04/10 18:37:05.125 +08:00] [INFO] [job_table.go:289] ["[ddl] handle ddl job failed"] [error="[kv:9007]Write conflict, txnStartTS=440696311416356887, conflictStartTS=440696311678500883, conflictCommitTS=440696311678500884, key=[]byte{0x6d, 0x4e, 0x65, 0x78, 0x74, 0x47, 0x6c, 0x6f, 0x62, 0xff, 0x61, 0x6c, 0x49, 0x44, 0x0, 0x0, 0x0, 0x0, 0xfb, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x73}, originalKey=6d4e657874476c6f62ff616c494400000000fb0000000000000073, primary={metaKey=true, key=DB:2, field=Table:1014}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3130ff3134000000000000f9, reason=Optimistic [try again later]"] [job="ID:1015, Type:truncate table, State:done, SchemaState:public, SchemaID:2, TableID:712, RowCount:0, ArgLen:2, start time: 2023-04-10 18:36:54.081 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2023/04/10 18:37:05.127 +08:00] [INFO] [ddl_worker.go:944] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:1015, Type:truncate table, State:queueing, SchemaState:none, SchemaID:2, TableID:712, RowCount:0, ArgLen:0, start time: 2023-04-10 18:36:54.081 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]

Because of write conflict, the ddl job retries and user should wait more time for ddl finish

4. What is your TiDB version? (Required)

master

The text was updated successfully, but these errors were encountered:

lcwangchao · 2023-04-11T12:48:26Z

This is because we alloc new partition id when ddl job is running in a txn:

tidb/ddl/partition.go

Lines 3324 to 3337 in 8eb580e

    
           func truncateTableByReassignPartitionIDs(t *meta.Meta, tblInfo *model.TableInfo) error { 
        
           	newDefs := make([]model.PartitionDefinition, 0, len(tblInfo.Partition.Definitions)) 
        
           	for _, def := range tblInfo.Partition.Definitions { 
        
           		pid, err := t.GenGlobalID() 
        
           		if err != nil { 
        
           			return errors.Trace(err) 
        
           		} 
        
           		newDef := def 
        
           		newDef.ID = pid 
        
           		newDefs = append(newDefs, newDef) 
        
           	} 
        
           	tblInfo.Partition.Definitions = newDefs 
        
           	return nil 
        
           }

We also alloc job id using the same increase key, so if there are too many ddl jobs, the conflict probability increases.

The reason why is are so many ddl jobs is because the tiflash will send ddl request "update tiflash replica status" to update tiflash meta for each partition after truncate table, so when partition is too many, there will be a lot of jobs.

bb7133 · 2023-04-11T13:14:52Z

Yeah...make sense

But this is not about correctness, am I right?

lcwangchao · 2023-04-12T01:24:28Z

Yeah...make sense

But this is not about correctness, am I right?

Yes, it is not about correctness but only performance related

close #42940

#43005) close #42940

lcwangchao added type/bug The issue is confirmed as a bug. sig/sql-infra SIG: SQL Infra severity/major labels Apr 11, 2023

mjonss self-assigned this Apr 11, 2023

mjonss mentioned this issue Apr 11, 2023

ddl: Fix issue with truncate table and partitions and tiflash #42957

Merged

12 tasks

lcwangchao mentioned this issue Apr 12, 2023

ddl: Fix issue with truncate table and partitions and tiflash #43001

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Apr 12, 2023

ddl: Fix issue with truncate table and partitions and tiflash (#43001)

1a2b9bf

close #42940

ti-chi-bot closed this as completed in #42957 Apr 12, 2023

ti-chi-bot pushed a commit that referenced this issue Apr 12, 2023

ddl: Fix issue with truncate table and partitions and tiflash (#42957)

c1ce67d

close #42940

This was referenced Apr 12, 2023

ddl: Fix issue with truncate table and partitions and tiflash (#42957) #43005

Merged

ddl: Fix issue with truncate table and partitions and tiflash (#42957) #43006

Closed

mjonss mentioned this issue May 18, 2023

ddl: fix truncate partition for global index #42433

Merged

12 tasks

ti-chi-bot bot pushed a commit that referenced this issue Jun 29, 2023

ddl: Fix issue with truncate table and partitions and tiflash (#42957) (

afd7584

#43005) close #42940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncate table with many partitions with tiflash replica may encounter write conflict and retry #42940

truncate table with many partitions with tiflash replica may encounter write conflict and retry #42940

lcwangchao commented Apr 11, 2023

lcwangchao commented Apr 11, 2023 •

edited

Loading

bb7133 commented Apr 11, 2023

lcwangchao commented Apr 12, 2023

truncate table with many partitions with tiflash replica may encounter write conflict and retry #42940

truncate table with many partitions with tiflash replica may encounter write conflict and retry #42940

Comments

lcwangchao commented Apr 11, 2023

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

lcwangchao commented Apr 11, 2023 • edited Loading

bb7133 commented Apr 11, 2023

lcwangchao commented Apr 12, 2023

lcwangchao commented Apr 11, 2023 •

edited

Loading