Dropped tables can not be physically removed if `syncAllSchema` is trigger frequently #9227

JaySon-Huang · 2024-07-12T07:44:36Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Run the following SQLs repeatedly

CREATE TABLE if not exists test_range_1 (id INT,name VARCHAR(20))
PARTITION BY RANGE (id) (PARTITION p0 VALUES LESS THAN (10),PARTITION p1 VALUES LESS THAN (20),PARTITION p2 VALUES LESS THAN (30),PARTITION p3 VALUES LESS THAN (40),PARTITION p4 VALUES LESS THAN (50);

CREATE TABLE if not exists test_swap_1 (id INT,name VARCHAR(20));
ALTER TABLE test_range_1 EXCHANGE PARTITION p0 WITH TABLE test_swap_1;
DROP TABLE test_swap_1;

2. What did you expect to see? (Required)

The dropped table can be successfully removed from tiflash instances

3. What did you see instead (Required)

The dropped tables' IStorage instances remain in TiFlash even after the gc_safepoint is exceed than the drop timepoint + gc_lifetime.
More and more ".sql" files of the dropped tables' remain in TiFlash instance disk. TiFlash sync all schema become more and more slower.

4. What is your TiFlash version? (Required)

v6.5.10

The text was updated successfully, but these errors were encountered:

JaySon-Huang · 2024-07-12T08:17:11Z

In the previous versions

Repeatedly execute the given SQLs. The rapidly swap and dropped table_id will lead to some warnings like "miss table" or "miss partition table in TiKV" with "Errors::DDL::StaleSchema"

tiflash/dbms/src/TiDB/Schema/SchemaBuilder.cpp

Lines 871 to 877 in 28a2314

    
           /// step 3 change partition of the partition table to non partition table 
        
           table_info = getter.getTableInfo(npt_db_info->id, pt_partition_id); 
        
           if (table_info == nullptr) 
        
           { 
        
               LOG_WARNING(log, "Execute exchange partition, the table info of partition can not get from TiKV, npt_database_id={} partition_id={}", npt_database_id, pt_partition_id); 
        
               throw TiFlashException(fmt::format("miss partition table in TiKV, may have been dropped, physical_table_id={}", pt_partition_id), Errors::DDL::StaleSchema); 
        
           }

When tiflash meet exception in tryLoadSchemaDiffs, tiflash will fallback to loadAllSchema to fetch all tables info from TiKV.

tiflash/dbms/src/TiDB/Schema/TiDBSchemaSyncer.h

Lines 159 to 164 in 28a2314

    
           Int64 version_after_load_diff = 0; 
        
           if (version_after_load_diff = tryLoadSchemaDiffs(getter, cur_version, version, context, ks_log); version_after_load_diff == -1) 
        
           { 
        
               GET_METRIC(tiflash_schema_apply_count, type_full).Increment(); 
        
               version_after_load_diff = loadAllSchema(getter, version, context); 
        
           }

In SchemaBuilder::syncAllSchema, each table that does not exists in tikv but still in the tiflash instance. TiFlash will execute applyDropPhysicalTable for each table, wich will update the "tombstone_ts" of the table in its ".sql" file.
Even if the table is already "tombstone", the function will update the "tombstone_ts" by a newer tso from PD. This make the IStorage instance still exist even after the gc_safepoint is exceed than the drop timepoint + gc_lifetime.

tiflash/dbms/src/TiDB/Schema/SchemaBuilder.cpp

Lines 1437 to 1451 in 28a2314

    
           /// Drop all unmapped tables. 
        
           auto storage_map = tmt_context.getStorages().getAllStorage(); 
        
           for (auto it = storage_map.begin(); it != storage_map.end(); it++) 
        
           { 
        
               auto table_info = it->second->getTableInfo(); 
        
               if (table_info.keyspace_id != keyspace_id) 
        
               { 
        
                   continue; 
        
               } 
        
               if (table_set.count(table_info.id) == 0) 
        
               { 
        
                   applyDropPhysicalTable(it->second->getDatabaseName(), table_info.id); 
        
                   LOG_INFO(log, "Table {}.{} dropped during sync all schemas", it->second->getDatabaseName(), name_mapper.debugTableName(table_info)); 
        
               } 
        
           }

tiflash/dbms/src/TiDB/Schema/SchemaBuilder.cpp

Lines 1240 to 1268 in 28a2314

    
           template <typename Getter, typename NameMapper> 
        
           void SchemaBuilder<Getter, NameMapper>::applyDropPhysicalTable(const String & db_name, TableID table_id) 
        
           { 
        
               auto & tmt_context = context.getTMTContext(); 
        
               auto storage = tmt_context.getStorages().get(keyspace_id, table_id); 
        
               if (storage == nullptr) 
        
               { 
        
                   LOG_DEBUG(log, "table {} does not exist.", table_id); 
        
                   return; 
        
               } 
        
               GET_METRIC(tiflash_schema_internal_ddl_count, type_drop_table).Increment(); 
        
               LOG_INFO(log, "Tombstoning table {}.{}", db_name, name_mapper.debugTableName(storage->getTableInfo())); 
        
               const UInt64 tombstone_ts = PDClientHelper::getTSO(tmt_context.getPDClient(), PDClientHelper::get_tso_maxtime); 
        
               AlterCommands commands; 
        
               { 
        
                   AlterCommand command; 
        
                   command.type = AlterCommand::TOMBSTONE; 
        
                   // We don't try to get a precise time that TiDB drops this table. 
        
                   // We use a more relaxing GC strategy: 
        
                   // 1. Use current timestamp, which is after TiDB's drop time, to be the tombstone of this table; 
        
                   // 2. Use the same GC safe point as TiDB. 
        
                   // In such way our table will be GC-ed later than TiDB, which is safe and correct. 
        
                   command.tombstone = tombstone_ts; 
        
                   commands.emplace_back(std::move(command)); 
        
               } 
        
               auto alter_lock = storage->lockForAlter(getThreadNameAndID()); 
        
               storage->alterFromTiDB(alter_lock, commands, db_name, storage->getTableInfo(), name_mapper, context); 
        
               LOG_INFO(log, "Tombstoned table {}.{}, tombstone={}", db_name, name_mapper.debugTableName(storage->getTableInfo()), tombstone_ts); 
        
           }

seiya-annie · 2024-08-02T11:32:15Z

/report customer

… (release-6.5) (#9230) close #9227

… (release-7.1) (#9231) close #9227

JaySon-Huang · 2024-09-03T07:46:49Z

Close as it will be fixed in the next patch version of release-6.5 and release-7.1. v7.5 or later are not affected.

JaySon-Huang added type/bug The issue is confirmed as a bug. component/storage affects-6.5 affects-7.1 labels Jul 12, 2024

This was referenced Jul 12, 2024

ddl: Skip updating the tombstone_ts if the table is already tombstone (release-6.5) #9230

Merged

ddl: Skip updating the tombstone_ts if the table is already tombstone (release-7.1) #9231

Merged

ti-chi-bot bot added the report/customer Customers have encountered this bug. label Aug 2, 2024

ti-chi-bot added affects-5.4 affects-6.1 labels Aug 5, 2024

jebter added severity/major impact/leak labels Aug 6, 2024

ti-chi-bot bot added may-affects-7.5 may-affects-8.1 labels Aug 6, 2024

JaySon-Huang removed may-affects-7.5 may-affects-8.1 labels Sep 2, 2024

ti-chi-bot bot pushed a commit that referenced this issue Sep 3, 2024

ddl: Skip updating the tombstone_ts if the table is already tombstone…

a80b6c7

… (release-6.5) (#9230) close #9227

ti-chi-bot bot pushed a commit that referenced this issue Sep 3, 2024

ddl: Skip updating the tombstone_ts if the table is already tombstone…

45ef5ed

… (release-7.1) (#9231) close #9227

JaySon-Huang closed this as completed Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropped tables can not be physically removed if `syncAllSchema` is trigger frequently #9227

Dropped tables can not be physically removed if `syncAllSchema` is trigger frequently #9227

JaySon-Huang commented Jul 12, 2024 •

edited

Loading

JaySon-Huang commented Jul 12, 2024

seiya-annie commented Aug 2, 2024

JaySon-Huang commented Sep 3, 2024

Dropped tables can not be physically removed if syncAllSchema is trigger frequently #9227

Dropped tables can not be physically removed if syncAllSchema is trigger frequently #9227

Comments

JaySon-Huang commented Jul 12, 2024 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

JaySon-Huang commented Jul 12, 2024

seiya-annie commented Aug 2, 2024

JaySon-Huang commented Sep 3, 2024

Dropped tables can not be physically removed if `syncAllSchema` is trigger frequently #9227

Dropped tables can not be physically removed if `syncAllSchema` is trigger frequently #9227

JaySon-Huang commented Jul 12, 2024 •

edited

Loading