Analyze is too slow when run dml in million tables scenario #57631

lilinghai · 2024-11-22T07:46:47Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

about 100k partition tables and each 10k rows

CREATE TABLE `sbtest1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `ec1` varchar(40) DEFAULT NULL,
  `ec2` varchar(40) DEFAULT NULL,
  `ec3` varchar(40) DEFAULT NULL,
  `ec4` varchar(40) DEFAULT NULL,
  `ec5` varchar(40) DEFAULT NULL,
  `ec6` varchar(40) DEFAULT NULL,
  `ec7` varchar(40) DEFAULT NULL,
  `ec8` varchar(40) DEFAULT NULL,
  `ec9` varchar(40) DEFAULT NULL,
  `ec10` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `k_2` (`k`),
  KEY `ek1` (`ec1`(30)),
  KEY `ek2` (`ec2`(30)),
  KEY `ek3` (`ec3`(30)),
  KEY `ek4` (`ec4`(30))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=1087002
PARTITION BY RANGE (`id`)
(PARTITION `p1` VALUES LESS THAN (1001),
 PARTITION `p2` VALUES LESS THAN (2001),
 PARTITION `p3` VALUES LESS THAN (3001),
 PARTITION `p4` VALUES LESS THAN (4001),
 PARTITION `p5` VALUES LESS THAN (5001),
 PARTITION `p6` VALUES LESS THAN (6001),
 PARTITION `p7` VALUES LESS THAN (7001),
 PARTITION `p8` VALUES LESS THAN (8001),
 PARTITION `p9` VALUES LESS THAN (9001),
 PARTITION `p10` VALUES LESS THAN (MAXVALUE))

run dml on some of the tables
execute analyze table
It takes about 3 seconds if not having dml workload

mysql> analyze table sbtest100.sbtest1;
Query OK, 0 rows affected, 10 warnings (21 min 48.62 sec)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

master

Release Version: v8.5.0
Edition: Community
Git Commit Hash: eb871f8
Git Branch: heads/refs/tags/v8.5.0
UTC Build Time: 2024-11-22 03:01:24
GoVersion: go1.23.3
Race Enabled: false
Check Table Before Drop: false
Store: tikv

The text was updated successfully, but these errors were encountered:

Rustin170506 · 2024-11-26T04:02:19Z

During the analysis process, stats are expected to be collected from TiKV and stored in TiDB. Additionally, the stats cache needs to be updated to ensure it reflects the latest stats flushed to TiDB. This guarantees that users can execute SQL queries on the table immediately after the stats collection is completed.

For this issue, the problem originates from the stats cache updating process.

As you can tell from here:

tidb/pkg/statistics/handle/cache/statscache.go

Line 77 in e4f47d3

    
           "SELECT version, table_id, modify_count, count, snapshot from mysql.stats_meta where version > %? order by version",

TiDB queries all updated stats metadata from the system table and processes them one by one. In this case, with more than 100k tables being updated simultaneously, this results in a bulk read. Processing them one by one significantly impacts performance, making it even worse. As a result, you can see that analyzing a table takes a very long time, as it gets stuck at the final step of updating the stats cache.

To address this problem, we can do two things to reduce the duration.

We can limit the number of tables processed at a time. In this case, throughput isn’t a major concern, and a delay of a few minutes is acceptable. Adding a LIMIT clause to control how many tables are handled per batch can solve the issue without requiring significant changes in version 8.5. Additionally, we could use multiple threads to process these tables concurrently—for instance, leveraging half the number of CPU cores. This would help speed up the process and improve efficiency.

The reason we need to load the latest stats into the cache is to ensure that the most relevant stats are immediately available after the analysis. We don’t want to compromise this guarantee. However, we can improve the performance of this process by updating only the stats for tables that have been modified, rather than reloading stats for all tables. This approach will significantly enhance performance.
For example, if we analyze tables t1 and t2 using ANALYZE TABLE t1, t2, we can call the Update function with t1ID and t2ID. This way, we only update the stats for these two tables, significantly improving the efficiency of the process.

I have verified the short-term solution in this cluster, it worked very well.
Before:

After

Check my changes at #57638.

…57911) close #57631

lilinghai added the type/bug The issue is confirmed as a bug. label Nov 22, 2024

jebter added sig/planner SIG: Planner severity/major labels Nov 22, 2024

ti-chi-bot bot added may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 may-affects-8.5 labels Nov 22, 2024

jebter added the affects-8.5 This bug affects the 8.5.x(LTS) versions. label Nov 22, 2024

ti-chi-bot bot removed the may-affects-8.5 label Nov 22, 2024

fixdb assigned Rustin170506 Nov 22, 2024

Rustin170506 mentioned this issue Nov 27, 2024

statistics: avoid stats meta full load after table analysis #57756

Merged

13 tasks

Rustin170506 removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels Nov 27, 2024

winoros mentioned this issue Dec 3, 2024

statistics background update in-memory objects worker reads a lot of unneeded items from storage #57905

Closed

ti-chi-bot bot closed this as completed in #57756 Dec 3, 2024

ti-chi-bot bot closed this as completed in f585f5d Dec 3, 2024

ti-chi-bot mentioned this issue Dec 3, 2024

statistics: avoid stats meta full load after table analysis (#57756) #57911

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Dec 3, 2024

statistics: avoid stats meta full load after table analysis (#57756) (#…

6087f99

…57911) close #57631

winoros mentioned this issue Dec 3, 2024

Update the stats cache batch by batch instead of updating once in periodic statistics update worker #57953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze is too slow when run dml in million tables scenario #57631

Analyze is too slow when run dml in million tables scenario #57631

lilinghai commented Nov 22, 2024 •

edited

Loading

Rustin170506 commented Nov 26, 2024 •

edited

Loading

Analyze is too slow when run dml in million tables scenario #57631

Analyze is too slow when run dml in million tables scenario #57631

Comments

lilinghai commented Nov 22, 2024 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

Rustin170506 commented Nov 26, 2024 • edited Loading

lilinghai commented Nov 22, 2024 •

edited

Loading

Rustin170506 commented Nov 26, 2024 •

edited

Loading