Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data inconsistency when tpcc check #39266

Closed
lilinghai opened this issue Nov 21, 2022 · 7 comments · Fixed by #39547
Closed

Data inconsistency when tpcc check #39266

lilinghai opened this issue Nov 21, 2022 · 7 comments · Fixed by #39547
Assignees
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 severity/critical sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@lilinghai
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

tpcc 100 warehouse data prepare, set tiflash replica , run some time. Then run the follow sql

SELECT
    T.o_ol_cnt,T.order_line_count
FROM
    (
        SELECT
            o_ol_cnt,
            order_line_count
        FROM
            orders
            LEFT JOIN (
                SELECT
                    ol_w_id,
                    ol_d_id,
                    ol_o_id,
                    count(*) order_line_count
                FROM
                    order_line
                GROUP BY
                    ol_w_id,
                    ol_d_id,
                    ol_o_id
                ORDER by
                    ol_w_id,
                    ol_d_id,
                    ol_o_id
            ) AS order_line ON orders.o_w_id = order_line.ol_w_id
            AND orders.o_d_id = order_line.ol_d_id
            AND orders.o_id = order_line.ol_o_id
        WHERE
            orders.o_w_id = 7
    ) AS T
WHERE
    T.o_ol_cnt != T.order_line_count

The plan is

explain analyze SELECT T.o_ol_cnt,T.order_line_count FROM (SELECT o_ol_cnt, order_line_count FROM orders LEFT JOIN (SELECT ol_w_id, ol_d_id, ol_o_id, count(*) order_line_count FROM order_line GROUP BY ol_w_id, ol_d_id, ol_o_id ORDER by ol_w_id, ol_d_id, ol_o_id) AS order_line ON orders.o_w_id = order_line.ol_w_id AND orders.o_d_id = order_line.ol_d_id AND orders.o_id = order_line.ol_o_id WHERE orders.o_w_id = 7) AS T WHERE T.o_ol_cnt != T.order_line_count;
+---------------------------------+-----------+---------+--------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                              | estRows   | actRows | task         | access object    | execution info                                                                                                                                                                                                                                                                                                        | operator info                                                                                                                                                                                                                                                                                                               | memory   | disk |
+---------------------------------+-----------+---------+--------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexJoin_21                    | 31121.95  | 2       | root         |                  | time:1.09s, loops:2, inner:{total:2.49s, concurrency:5, task:9, construct:54.1ms, fetch:2.43s, build:9.55ms}, probe:40.2ms                                                                                                                                                                                            | inner join, inner:TableReader_17, outer key:tpcc.order_line.ol_d_id, tpcc.order_line.ol_o_id, inner key:tpcc.orders.o_d_id, tpcc.orders.o_id, equal cond:eq(tpcc.order_line.ol_d_id, tpcc.orders.o_d_id), eq(tpcc.order_line.ol_o_id, tpcc.orders.o_id), other cond:ne(tpcc.orders.o_ol_cnt, Column#19)                     | 8.13 MB  | N/A  |
| ├─StreamAgg_92(Build)           | 31.12     | 31257   | root         |                  | time:75.8ms, loops:36                                                                                                                                                                                                                                                                                                 | group by:tpcc.order_line.ol_d_id, tpcc.order_line.ol_o_id, tpcc.order_line.ol_w_id, funcs:count(Column#44)->Column#19, funcs:firstrow(tpcc.order_line.ol_o_id)->tpcc.order_line.ol_o_id, funcs:firstrow(tpcc.order_line.ol_d_id)->tpcc.order_line.ol_d_id, funcs:firstrow(tpcc.order_line.ol_w_id)->tpcc.order_line.ol_w_id | 33.1 KB  | N/A  |
| │ └─TableReader_93              | 31.12     | 31257   | root         |                  | time:53.4ms, loops:33, cop_task: {num: 2, max: 53.2ms, min: 30.5ms, avg: 41.9ms, p95: 53.2ms, rpc_num: 2, rpc_time: 83.7ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                                                                                      | data:StreamAgg_88                                                                                                                                                                                                                                                                                                           | 978.5 KB | N/A  |
| │   └─StreamAgg_88              | 31.12     | 31257   | cop[tiflash] |                  | tiflash_task:{proc max:49.7ms, min:28.7ms, avg: 39.2ms, p80:49.7ms, p95:49.7ms, iters:2, tasks:2, threads:2}                                                                                                                                                                                                          | group by:tpcc.order_line.ol_d_id, tpcc.order_line.ol_o_id, tpcc.order_line.ol_w_id, funcs:count(1)->Column#44                                                                                                                                                                                                               | N/A      | N/A  |
| │     └─TableRangeScan_79       | 311221.15 | 312602  | cop[tiflash] | table:order_line | tiflash_task:{proc max:45.7ms, min:25.7ms, avg: 35.7ms, p80:45.7ms, p95:45.7ms, iters:6, tasks:2, threads:2}                                                                                                                                                                                                          | range:[7,7], keep order:true                                                                                                                                                                                                                                                                                                | N/A      | N/A  |
| └─TableReader_17(Probe)         | 0.31      | 31256   | root         |                  | time:2.38s, loops:46, cop_task: {num: 31, max: 181.6ms, min: 1.03ms, avg: 76.3ms, p95: 181.5ms, max_proc_keys: 5088, p95_proc_keys: 3096, tot_proc: 1.96s, tot_wait: 23ms, rpc_num: 31, rpc_time: 2.37s, copr_cache_hit_ratio: 0.26, distsql_concurrency: 15}                                                         | data:Selection_16                                                                                                                                                                                                                                                                                                           | N/A      | N/A  |
|   └─Selection_16                | 0.31      | 31256   | cop[tikv]    |                  | tikv_task:{proc max:166ms, min:4ms, avg: 69.6ms, p80:129ms, p95:164ms, iters:139, tasks:31}, scan_detail: {total_process_keys: 29464, total_process_keys_size: 2291461, total_keys: 59882, get_snapshot_time: 1.03ms, rocksdb: {delete_skipped_count: 8, key_skipped_count: 30409, block: {cache_hit_count: 265363}}} | eq(tpcc.orders.o_w_id, 7), not(isnull(tpcc.orders.o_ol_cnt))                                                                                                                                                                                                                                                                | N/A      | N/A  |
|     └─TableRangeScan_15         | 31.12     | 31256   | cop[tikv]    | table:orders     | tikv_task:{proc max:165ms, min:4ms, avg: 69.5ms, p80:129ms, p95:164ms, iters:139, tasks:31}                                                                                                                                                                                                                           | range: decided by [eq(tpcc.orders.o_d_id, tpcc.order_line.ol_d_id) eq(tpcc.orders.o_id, tpcc.order_line.ol_o_id) eq(tpcc.orders.o_w_id, 7)], keep order:false                                                                                                                                                               | N/A      | N/A  |
+---------------------------------+-----------+---------+--------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
8 rows in set (1.18 sec)

2. What did you expect to see? (Required)

empty set

3. What did you see instead (Required)

+----------+------------------+
| o_ol_cnt | order_line_count |
+----------+------------------+
|        8 |                7 |
|        8 |                1 |
+----------+------------------+

4. What is your TiDB version? (Required)

Release Version: v6.5.0-alpha
Edition: Community
Git Commit Hash: 3bcd5a8
Git Branch: heads/refs/tags/v6.5.0-alpha
UTC Build Time: 2022-11-19 11:13:51
GoVersion: go1.19.2
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv

@lilinghai lilinghai added the type/bug The issue is confirmed as a bug. label Nov 21, 2022
@gengliqi
Copy link
Contributor

Does the result right when using TiKV replica only?

@lilinghai
Copy link
Contributor Author

yes. it right when using single tikv or tiflash with control of the var tidb_isolation_read_engiens

@jebter
Copy link

jebter commented Nov 22, 2022

@breezewish Please add troubleshooting information

@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 labels Nov 22, 2022
@breezewish
Copy link
Member

A minimal reproduce SQL is:

SELECT
    ol_w_id,
    ol_d_id,
    ol_o_id,
    count(*) order_line_count
FROM
    order_line
WHERE
    ol_w_id = 7
GROUP BY
    ol_w_id,
    ol_d_id,
    ol_o_id
ORDER by
    ol_w_id,
    ol_d_id,
    ol_o_id;

When Isolation Engine = TiKV, we will discover this row in the result set (which is correct):

|       7 |       8 |    1080 |                8 |

When using TiFlash Coprocessor, this row becomes incorrect result:

|       7 |       8 |    1080 |                7 |

@jebter jebter added the sig/execution SIG execution label Nov 22, 2022
@chrysan chrysan assigned chrysan and Reminiscent and unassigned chrysan Nov 24, 2022
@Reminiscent
Copy link
Contributor

I think the plan is correct here. So I remove the planner label. If there are another possible problems may related to the planner, feel free to ping me.

@Reminiscent Reminiscent removed their assignment Nov 24, 2022
@Reminiscent Reminiscent removed the sig/planner SIG: Planner label Nov 24, 2022
@xzhangxian1008
Copy link
Contributor

Optimizer should not push the streamAgg down to the tiflash when plan contains group by key, as tiflash doesn't implement the related function for streamAgg. So, optimizer should follow this agreement and tiflash may need to raise error when it receives inappropriate streamAgg.

@Reminiscent Reminiscent added the sig/planner SIG: Planner label Nov 29, 2022
@Reminiscent Reminiscent assigned fixdb and unassigned xzhangxian1008 Nov 29, 2022
@fixdb
Copy link
Contributor

fixdb commented Nov 29, 2022

I can reproduce the plan with the following query:

create table foo(a int, b int, c int, d int, primary key(a,b,c,d));
alter table foo set tiflash replica 1;
insert into foo values(1,2,3,1),(1,2,3,6),(1,2,3,5),(1,2,3,2),(1,2,3,4),(1,2,3,7),(1,2,3,3),(1,2,3,0);
set @@tidb_allow_mpp=off;
explain select a,b,c,count(*) from foo  group by a,b,c order by a,b,c;
+------------------------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows | task         | access object | operator info                                                                                                                                                                                         |
+------------------------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_33                | 6.40    | root         |               | test.foo.a, test.foo.b, test.foo.c, Column#5                                                                                                                                                          |
| └─StreamAgg_38               | 6.40    | root         |               | group by:test.foo.a, test.foo.b, test.foo.c, funcs:count(Column#18)->Column#5, funcs:firstrow(test.foo.a)->test.foo.a, funcs:firstrow(test.foo.b)->test.foo.b, funcs:firstrow(test.foo.c)->test.foo.c |
|   └─TableReader_39           | 6.40    | root         |               | data:StreamAgg_34                                                                                                                                                                                     |
|     └─StreamAgg_34           | 6.40    | cop[tiflash] |               | group by:test.foo.a, test.foo.b, test.foo.c, funcs:count(1)->Column#18                                                                                                                                |
|       └─TableFullScan_24     | 8.00    | cop[tiflash] | table:foo     | keep order:true, stats:pseudo                                                                                                                                                                         |
+------------------------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

We will disable the generation of streamagg on tiflash.

@winoros winoros added affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 labels Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 severity/critical sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.