Commit 2ea5621
[SPARK-49489][SQL][HIVE] HMS client respects
### What changes were proposed in this pull request?
Partly port HIVE-26633 for Spark HMS client - respect `hive.thrift.client.max.message.size` if present and the value is positive.
> Thrift client configuration for max message size. 0 or -1 will use the default defined in the Thrift library. The upper limit is 2147483648 bytes (or 2gb).
Note: it's a Hive configuration, I follow the convention to not document on the Spark side.
### Why are the changes needed?
1. THRIFT-5237 (0.14.0) changes the max thrift message size from 2GiB to 100MiB
2. HIVE-25098 (4.0.0) upgrades Thrift from 0.13.0 to 0.14.1
3. HIVE-25996 (2.3.10) backports HIVE-25098 to branch-2.3
4. HIVE-26633 (4.0.0) introduces `hive.thrift.client.max.message.size`
5. SPARK-47018 (4.0.0) upgrades Hive from 2.3.9 to 2.3.10
Thus, Spark's HMS client does not respect `hive.thrift.client.max.message.size` and has a fixed max thrift message size 100MiB, users may hit the "MaxMessageSize reached" exception on accessing Hive tables with a large number of partitions.
See discussion in apache#46468 (comment)
### Does this PR introduce _any_ user-facing change?
No, it tackles the regression introduced by an unreleased change, namely SPARK-47018. The added code only takes effect when the user configures `hive.thrift.client.max.message.size` explicitly.
### How was this patch tested?
This must be tested manually, as the current Spark UT does not cover the remote HMS cases.
I constructed a test case in a testing Hadoop cluster with a remote HMS.
Firstly, create a table with a large number of partitions.
```
$ spark-sql --num-executors=6 --executor-cores=4 --executor-memory=1g \
--conf spark.hive.exec.dynamic.partition.mode=nonstrict \
--conf spark.hive.exec.max.dynamic.partitions=1000000
spark-sql (default)> CREATE TABLE p PARTITIONED BY (year, month, day) STORED AS PARQUET AS
SELECT /*+ REPARTITION(200) */ * FROM (
(SELECT CAST(id AS STRING) AS year FROM range(2000, 2100)) JOIN
(SELECT CAST(id AS STRING) AS month FROM range(1, 13)) JOIN
(SELECT CAST(id AS STRING) AS day FROM range(1, 31)) JOIN
(SELECT 'this is some data' AS data)
);
```
Then try to tune `hive.thrift.client.max.message.size` and run a query that would trigger `getPartitions` thrift call. For example, when set to `1kb`, it throws `TTransportException: MaxMessageSize reached`, and the exception disappears after boosting the value.
```
$ spark-sql --conf spark.hive.thrift.client.max.message.size=1kb
spark-sql (default)> SHOW PARTITIONS p;
...
2025-02-20 15:18:49 WARN RetryingMetaStoreClient: MetaStoreClient lost connection. Attempting to reconnect (1 of 1) after 1s. listPartitionNames
org.apache.thrift.transport.TTransportException: MaxMessageSize reached
at org.apache.thrift.transport.TEndpointTransport.checkReadBytesAvailable(TEndpointTransport.java:81) ~[libthrift-0.16.0.jar:0.16.0]
at org.apache.thrift.protocol.TProtocol.checkReadBytesAvailable(TProtocol.java:67) ~[libthrift-0.16.0.jar:0.16.0]
at org.apache.thrift.protocol.TBinaryProtocol.readListBegin(TBinaryProtocol.java:297) ~[libthrift-0.16.0.jar:0.16.0]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result$get_partition_names_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result$get_partition_names_resultStandardScheme.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partition_names_result.read(ThriftHiveMetastore.java) ~[hive-metastore-2.3.10.jar:2.3.10]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) ~[libthrift-0.16.0.jar:0.16.0]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition_names(ThriftHiveMetastore.java:2458) ~[hive-metastore-2.3.10.jar:2.3.10]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition_names(ThriftHiveMetastore.java:2443) ~[hive-metastore-2.3.10.jar:2.3.10]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionNames(HiveMetaStoreClient.java:1487) ~[hive-metastore-2.3.10.jar:2.3.10]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) ~[hive-metastore-2.3.10.jar:2.3.10]
at jdk.proxy2/jdk.proxy2.$Proxy54.listPartitionNames(Unknown Source) ~[?:?]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:569) ~[?:?]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2349) ~[hive-metastore-2.3.10.jar:2.3.10]
at jdk.proxy2/jdk.proxy2.$Proxy54.listPartitionNames(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionNames(Hive.java:2461) ~[hive-exec-2.3.10-core.jar:2.3.10]
at org.apache.spark.sql.hive.client.Shim_v2_0.getPartitionNames(HiveShim.scala:976) ~[spark-hive_2.13-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
...
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes apache#50022 from pan3793/SPARK-49489.
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: yangjie01 <yangjie01@baidu.com>hive.thrift.client.maxmessage.size
1 parent e66e20a commit 2ea5621
File tree
3 files changed
+91
-12
lines changed- sql
- hive-thriftserver/src/test/scala/org/apache/spark/sql/hive
- hive/src/main/scala/org/apache/spark/sql/hive
- client
3 files changed
+91
-12
lines changedLines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
Lines changed: 13 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
92 | 97 | | |
93 | 98 | | |
94 | 99 | | |
| |||
Lines changed: 77 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | | - | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| 47 | + | |
46 | 48 | | |
| 49 | + | |
47 | 50 | | |
48 | 51 | | |
49 | 52 | | |
| |||
1407 | 1410 | | |
1408 | 1411 | | |
1409 | 1412 | | |
1410 | | - | |
| 1413 | + | |
1411 | 1414 | | |
1412 | 1415 | | |
1413 | 1416 | | |
1414 | | - | |
| 1417 | + | |
1415 | 1418 | | |
1416 | 1419 | | |
1417 | 1420 | | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
| 1431 | + | |
| 1432 | + | |
| 1433 | + | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
| 1439 | + | |
| 1440 | + | |
| 1441 | + | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
| 1445 | + | |
| 1446 | + | |
| 1447 | + | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
| 1454 | + | |
| 1455 | + | |
| 1456 | + | |
| 1457 | + | |
| 1458 | + | |
| 1459 | + | |
| 1460 | + | |
| 1461 | + | |
| 1462 | + | |
| 1463 | + | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
| 1473 | + | |
| 1474 | + | |
| 1475 | + | |
| 1476 | + | |
| 1477 | + | |
| 1478 | + | |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
1418 | 1491 | | |
1419 | 1492 | | |
0 commit comments