Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"DB::Exception: Fail to serialize response" thrown while running Q5 on TPC-H 100G without analyzing table #3215

Closed
JaySon-Huang opened this issue Oct 14, 2021 · 1 comment
Assignees
Labels
severity/major type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Oct 14, 2021

Steps to reproduce:

  1. Use br to restore tpc-h 100G on a cluster with only 1 tiflash instance https://docs.google.com/document/d/1dSSoEX1oxwxj1EXN4AEWTlI1rHt7EWkda8O_4FwMbjU/edit#
  2. Run Q5[1] on tpc-h 100G without analyzing table
  3. Q5 end up with an error message like ERROR 1105 (HY000) at line 1: other error for mpp stream: DB::Exception: Exchange receiver meet error : DB::Exception: Exchange receiver meet error : DB::Exception: Exchange receiver meet error : DB::Exception: Fail to serialize response, response size: 2480071798
> explain select /*+ read_from_storage(tiflash[customer,orders,lineitem,supplier,nation,region]) */         n_name,         sum(l_extendedprice * (1 - l_discount)) as revenue from         customer,         orders,         lineitem,         supplier,         nation,         region where         c_custkey = o_custkey         and l_orderkey = o_orderkey         and l_suppkey = s_suppkey         and c_nationkey = s_nationkey         and s_nationkey = n_nationkey         and n_regionkey = r_regionkey         and r_name = 'MIDDLE EAST'         and o_orderdate >= '1994-01-01'         and o_orderdate < date_add('1994-01-01', interval '1' year) group by         n_name order by         revenue desc;
+----------------------------------------------------------------------------+----------+-------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| id                                                                         | estRows  | task              | access object  | operator info                                                                                                                                 |
+----------------------------------------------------------------------------+----------+-------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Sort_24                                                                    | 12.50    | root              |                | Column#49:desc                                                                                                                                |
| └─Projection_26                                                            | 12.50    | root              |                | tpch_100.nation.n_name, Column#49                                                                                                             |
|   └─TableReader_109                                                        | 12.50    | root              |                | data:ExchangeSender_108                                                                                                                       |
|     └─ExchangeSender_108                                                   | 12.50    | batchCop[tiflash] |                | ExchangeType: PassThrough                                                                                                                     |
|       └─Projection_104                                                     | 12.50    | batchCop[tiflash] |                | Column#49, tpch_100.nation.n_name                                                                                                             |
|         └─HashAgg_105                                                      | 12.50    | batchCop[tiflash] |                | group by:tpch_100.nation.n_name, funcs:sum(Column#50)->Column#49, funcs:firstrow(tpch_100.nation.n_name)->tpch_100.nation.n_name              |
|           └─ExchangeReceiver_107                                           | 12.50    | batchCop[tiflash] |                |                                                                                                                                               |
|             └─ExchangeSender_106                                           | 12.50    | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.nation.n_name, collate: N/A]                                                          |
|               └─HashAgg_30                                                 | 12.50    | batchCop[tiflash] |                | group by:Column#55, funcs:sum(Column#54)->Column#50                                                                                           |
|                 └─Projection_114                                           | 30.52    | batchCop[tiflash] |                | mul(tpch_100.lineitem.l_extendedprice, minus(1, tpch_100.lineitem.l_discount))->Column#54, tpch_100.nation.n_name                             |
|                   └─Projection_103                                         | 30.52    | batchCop[tiflash] |                | tpch_100.lineitem.l_extendedprice, tpch_100.lineitem.l_discount, tpch_100.nation.n_name                                                       |
|                     └─HashJoin_97                                          | 30.52    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.supplier.s_suppkey, tpch_100.lineitem.l_suppkey) eq(tpch_100.orders.o_orderkey, tpch_100.lineitem.l_orderkey)] |
|                       ├─ExchangeReceiver_56(Build)                         | 24.41    | batchCop[tiflash] |                |                                                                                                                                               |
|                       │ └─ExchangeSender_55                                | 24.41    | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                       |
|                       │   └─HashJoin_36                                    | 24.41    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.customer.c_custkey, tpch_100.orders.o_custkey)]                                                                |
|                       │     ├─ExchangeReceiver_52(Build)                   | 19.53    | batchCop[tiflash] |                |                                                                                                                                               |
|                       │     │ └─ExchangeSender_51                          | 19.53    | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                       |
|                       │     │   └─HashJoin_37                              | 19.53    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.supplier.s_nationkey, tpch_100.customer.c_nationkey)]                                                          |
|                       │     │     ├─ExchangeReceiver_49(Build)             | 15.62    | batchCop[tiflash] |                |                                                                                                                                               |
|                       │     │     │ └─ExchangeSender_48                    | 15.62    | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                       |
|                       │     │     │   └─HashJoin_38                        | 15.62    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.nation.n_nationkey, tpch_100.supplier.s_nationkey)]                                                            |
|                       │     │     │     ├─ExchangeReceiver_46(Build)       | 12.50    | batchCop[tiflash] |                |                                                                                                                                               |
|                       │     │     │     │ └─ExchangeSender_45              | 12.50    | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                       |
|                       │     │     │     │   └─HashJoin_39                  | 12.50    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.region.r_regionkey, tpch_100.nation.n_regionkey)]                                                              |
|                       │     │     │     │     ├─ExchangeReceiver_43(Build) | 10.00    | batchCop[tiflash] |                |                                                                                                                                               |
|                       │     │     │     │     │ └─ExchangeSender_42        | 10.00    | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                       |
|                       │     │     │     │     │   └─Selection_41           | 10.00    | batchCop[tiflash] |                | eq(tpch_100.region.r_name, "MIDDLE EAST")                                                                                                     |
|                       │     │     │     │     │     └─TableFullScan_40     | 10000.00 | batchCop[tiflash] | table:region   | keep order:false, stats:pseudo                                                                                                                |
|                       │     │     │     │     └─TableFullScan_44(Probe)    | 10000.00 | batchCop[tiflash] | table:nation   | keep order:false, stats:pseudo                                                                                                                |
|                       │     │     │     └─TableFullScan_47(Probe)          | 10000.00 | batchCop[tiflash] | table:supplier | keep order:false, stats:pseudo                                                                                                                |
|                       │     │     └─TableFullScan_50(Probe)                | 10000.00 | batchCop[tiflash] | table:customer | keep order:false, stats:pseudo                                                                                                                |
|                       │     └─Selection_54(Probe)                          | 250.00   | batchCop[tiflash] |                | ge(tpch_100.orders.o_orderdate, 1994-01-01 00:00:00.000000), lt(tpch_100.orders.o_orderdate, 1995-01-01)                                      |
|                       │       └─TableFullScan_53                           | 10000.00 | batchCop[tiflash] | table:orders   | keep order:false, stats:pseudo                                                                                                                |
|                       └─TableFullScan_57(Probe)                            | 10000.00 | batchCop[tiflash] | table:lineitem | keep order:false, stats:pseudo                                                                                                                |
+----------------------------------------------------------------------------+----------+-------------------+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
> SHOW STATS_META;
+----------+------------+----------------+---------------------+--------------+-----------+
| Db_name  | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+----------+------------+----------------+---------------------+--------------+-----------+
| tpch_100 | region     |                | 2021-10-13 17:18:49 | 0            | 0         |
| tpch_100 | supplier   |                | 2021-10-13 17:18:49 | 0            | 0         |
| tpch_100 | customer   |                | 2021-10-13 17:18:49 | 0            | 0         |
| tpch_100 | nation     |                | 2021-10-13 17:18:49 | 0            | 0         |
| tpch_100 | orders     |                | 2021-10-13 17:18:49 | 0            | 0         |
| tpch_100 | lineitem   |                | 2021-10-13 17:18:50 | 0            | 0         |
| tpch_100 | part       |                | 2021-10-13 17:18:50 | 0            | 0         |
| tpch_100 | partsupp   |                | 2021-10-13 17:18:50 | 0            | 0         |
+----------+------------+----------------+---------------------+--------------+-----------+

After analyzing all tables on tpch_100, it can run successfully

> explain select /*+ read_from_storage(tiflash[customer,orders,lineitem,supplier,nation,region]) */         n_name,         sum(l_extendedprice * (1 - l_discount)) as revenue from         customer,         orders,         lineitem,         supplier,         nation,         region where         c_custkey = o_custkey         and l_orderkey = o_orderkey         and l_suppkey = s_suppkey         and c_nationkey = s_nationkey         and s_nationkey = n_nationkey         and n_regionkey = r_regionkey         and r_name = 'MIDDLE EAST'         and o_orderdate >= '1994-01-01'         and o_orderdate < date_add('1994-01-01', interval '1' year) group by         n_name order by         revenue desc;
+----------------------------------------------------------------------------+--------------+-------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                                                         | estRows      | task              | access object  | operator info                                                                                                                                   |
+----------------------------------------------------------------------------+--------------+-------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Sort_24                                                                    | 5.00         | root              |                | Column#49:desc                                                                                                                                  |
| └─Projection_26                                                            | 5.00         | root              |                | tpch_100.nation.n_name, Column#49                                                                                                               |
|   └─TableReader_116                                                        | 5.00         | root              |                | data:ExchangeSender_115                                                                                                                         |
|     └─ExchangeSender_115                                                   | 5.00         | batchCop[tiflash] |                | ExchangeType: PassThrough                                                                                                                       |
|       └─Projection_111                                                     | 5.00         | batchCop[tiflash] |                | Column#49, tpch_100.nation.n_name                                                                                                               |
|         └─HashAgg_112                                                      | 5.00         | batchCop[tiflash] |                | group by:tpch_100.nation.n_name, funcs:sum(Column#50)->Column#49, funcs:firstrow(tpch_100.nation.n_name)->tpch_100.nation.n_name                |
|           └─ExchangeReceiver_114                                           | 5.00         | batchCop[tiflash] |                |                                                                                                                                                 |
|             └─ExchangeSender_113                                           | 5.00         | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.nation.n_name, collate: N/A]                                                            |
|               └─HashAgg_30                                                 | 5.00         | batchCop[tiflash] |                | group by:Column#55, funcs:sum(Column#54)->Column#50                                                                                             |
|                 └─Projection_121                                           | 22988812.50  | batchCop[tiflash] |                | mul(tpch_100.lineitem.l_extendedprice, minus(1, tpch_100.lineitem.l_discount))->Column#54, tpch_100.nation.n_name                               |
|                   └─Projection_110                                         | 22988812.50  | batchCop[tiflash] |                | tpch_100.lineitem.l_extendedprice, tpch_100.lineitem.l_discount, tpch_100.nation.n_name                                                         |
|                     └─HashJoin_104                                         | 22988812.50  | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.supplier.s_nationkey, tpch_100.customer.c_nationkey) eq(tpch_100.orders.o_custkey, tpch_100.customer.c_custkey)] |
|                       ├─ExchangeReceiver_63(Build)                         | 15000000.00  | batchCop[tiflash] |                |                                                                                                                                                 |
|                       │ └─ExchangeSender_62                                | 15000000.00  | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.customer.c_nationkey, collate: N/A], [name: tpch_100.customer.c_custkey, collate: N/A]  |
|                       │   └─TableFullScan_61                               | 15000000.00  | batchCop[tiflash] | table:customer | keep order:false                                                                                                                                |
|                       └─ExchangeReceiver_60(Probe)                         | 22988812.50  | batchCop[tiflash] |                |                                                                                                                                                 |
|                         └─ExchangeSender_59                                | 22988812.50  | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.supplier.s_nationkey, collate: N/A], [name: tpch_100.orders.o_custkey, collate: N/A]    |
|                           └─HashJoin_36                                    | 22988812.50  | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.lineitem.l_orderkey, tpch_100.orders.o_orderkey)]                                                                |
|                             ├─ExchangeReceiver_58(Build)                   | 22988812.50  | batchCop[tiflash] |                |                                                                                                                                                 |
|                             │ └─ExchangeSender_57                          | 22988812.50  | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.orders.o_orderkey, collate: N/A]                                                        |
|                             │   └─Selection_56                             | 22988812.50  | batchCop[tiflash] |                | ge(tpch_100.orders.o_orderdate, 1994-01-01 00:00:00.000000), lt(tpch_100.orders.o_orderdate, 1995-01-01)                                        |
|                             │     └─TableFullScan_55                       | 150000000.00 | batchCop[tiflash] | table:orders   | keep order:false                                                                                                                                |
|                             └─ExchangeReceiver_54(Probe)                   | 120169087.65 | batchCop[tiflash] |                |                                                                                                                                                 |
|                               └─ExchangeSender_53                          | 120169087.65 | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.lineitem.l_orderkey, collate: N/A]                                                      |
|                                 └─HashJoin_37                              | 120169087.65 | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.supplier.s_suppkey, tpch_100.lineitem.l_suppkey)]                                                                |
|                                   ├─ExchangeReceiver_49(Build)             | 200000.00    | batchCop[tiflash] |                |                                                                                                                                                 |
|                                   │ └─ExchangeSender_48                    | 200000.00    | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.supplier.s_suppkey, collate: N/A]                                                       |
|                                   │   └─HashJoin_38                        | 200000.00    | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.nation.n_nationkey, tpch_100.supplier.s_nationkey)]                                                              |
|                                   │     ├─ExchangeReceiver_46(Build)       | 5.00         | batchCop[tiflash] |                |                                                                                                                                                 |
|                                   │     │ └─ExchangeSender_45              | 5.00         | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                         |
|                                   │     │   └─HashJoin_39                  | 5.00         | batchCop[tiflash] |                | inner join, equal:[eq(tpch_100.region.r_regionkey, tpch_100.nation.n_regionkey)]                                                                |
|                                   │     │     ├─ExchangeReceiver_43(Build) | 1.00         | batchCop[tiflash] |                |                                                                                                                                                 |
|                                   │     │     │ └─ExchangeSender_42        | 1.00         | batchCop[tiflash] |                | ExchangeType: Broadcast                                                                                                                         |
|                                   │     │     │   └─Selection_41           | 1.00         | batchCop[tiflash] |                | eq(tpch_100.region.r_name, "MIDDLE EAST")                                                                                                       |
|                                   │     │     │     └─TableFullScan_40     | 5.00         | batchCop[tiflash] | table:region   | keep order:false                                                                                                                                |
|                                   │     │     └─TableFullScan_44(Probe)    | 25.00        | batchCop[tiflash] | table:nation   | keep order:false                                                                                                                                |
|                                   │     └─TableFullScan_47(Probe)          | 1000000.00   | batchCop[tiflash] | table:supplier | keep order:false                                                                                                                                |
|                                   └─ExchangeReceiver_52(Probe)             | 600037902.00 | batchCop[tiflash] |                |                                                                                                                                                 |
|                                     └─ExchangeSender_51                    | 600037902.00 | batchCop[tiflash] |                | ExchangeType: HashPartition, Hash Cols: [name: tpch_100.lineitem.l_suppkey, collate: N/A]                                                       |
|                                       └─TableFullScan_50                   | 600037902.00 | batchCop[tiflash] | table:lineitem | keep order:false                                                                                                                                |
+----------------------------------------------------------------------------+--------------+-------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
> show stats_meta;
+----------+------------+----------------+---------------------+--------------+-----------+
| Db_name  | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+----------+------------+----------------+---------------------+--------------+-----------+
| tpch_100 | region     |                | 2021-10-14 13:59:19 | 0            | 5         |
| tpch_100 | supplier   |                | 2021-10-14 13:59:31 | 0            | 1000000   |
| tpch_100 | customer   |                | 2021-10-14 13:59:44 | 0            | 15000000  |
| tpch_100 | nation     |                | 2021-10-14 13:59:50 | 0            | 25        |
| tpch_100 | orders     |                | 2021-10-14 14:01:39 | 0            | 150000000 |
| tpch_100 | lineitem   |                | 2021-10-14 14:05:16 | 0            | 600037902 |
| tpch_100 | part       |                | 2021-10-14 14:00:03 | 0            | 20000000  |
| tpch_100 | partsupp   |                | 2021-10-14 14:00:25 | 0            | 80000000  |
+----------+------------+----------------+---------------------+--------------+-----------+

[1] TPC-H, Query 5: select /*+ read_from_storage(tiflash[customer,orders,lineitem,supplier,nation,region]) */ n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'MIDDLE EAST' and o_orderdate >= '1994-01-01' and o_orderdate < date_add('1994-01-01', interval '1' year) group by n_name order by revenue desc;

@JaySon-Huang JaySon-Huang added the type/bug The issue is confirmed as a bug. label Oct 14, 2021
@windtalker
Copy link
Contributor

It is caused by 2 known issues

  1. TiDB use lazy lode mode to load the stats info, so after restart TiDB server, the plan maybe not the optimal plan before the stats info is fully loaded.
  2. TiFlash's join operator might generate huge block that is unable to serialize by GRPC.(The block returned by InputStream should obey the max_block_size constraint #3436)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

3 participants