[SPARK-49827][SQL] Fetching all partitions from hive metastore in batches #48337
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When there is any predicate missing in getPartitionsbyFilter and it tries to fetch all the partitions, the request is broken into smaller chunks as:
Why are the changes needed?
The change is to address the issue of heavy load on HMS, when there are huge number of partitions(~600,000), the metadata size exceeds the 2Gb limit on the thrift server buffer size. Hence we get socket time out and HMS crashes with OOM as well. Tried to replicate same behaviour as HIVE-27505
Does this PR introduce any user-facing change?
Yes
To enable batching they should be using parameters as:
spark.sql.hive.metastore.batchSize = 1000 , by default it is disabled
spark.sql.metastore.partition.batch.retry.count = 3
How was this patch tested?
Tested in local environment with following performance
With batch size = 1
24/09/28 18:11:21 INFO Shim_v2_3: Fetching all partitions completed in 717 ms
With batch size = -1
24/09/28 18:14:16 INFO Shim_v2_3: Fetching all partitions completed in 51 ms.
With batch size = 10
24/09/28 18:16:20 INFO Shim_v2_3: Fetching all partitions completed in 115 ms.
Was this patch authored or co-authored using generative AI tooling?
No