Skip to content

Conversation

@Madhukar525722
Copy link

@Madhukar525722 Madhukar525722 commented Oct 3, 2024

What changes were proposed in this pull request?

When there is any predicate missing in getPartitionsbyFilter and it tries to fetch all the partitions, the request is broken into smaller chunks as:

  1. Retrieve the names of all partitions using getPartitionNames
  2. Divide the partition names list into smaller batches.
  3. Fetch the partitions using their names with function getPartitionsByNames.
  4. If the fetching fails, it reduces the batch size by 2 and looks for lesser number of partitions till the maxRetries hit

Why are the changes needed?

The change is to address the issue of heavy load on HMS, when there are huge number of partitions(~600,000), the metadata size exceeds the 2Gb limit on the thrift server buffer size. Hence we get socket time out and HMS crashes with OOM as well. Tried to replicate same behaviour as HIVE-27505

Does this PR introduce any user-facing change?

Yes
To enable batching they should be using parameters as:
spark.sql.hive.metastore.batchSize = 1000 , by default it is disabled
spark.sql.metastore.partition.batch.retry.count = 3

How was this patch tested?

Tested in local environment with following performance
With batch size = 1
24/09/28 18:11:21 INFO Shim_v2_3: Fetching all partitions completed in 717 ms

With batch size = -1
24/09/28 18:14:16 INFO Shim_v2_3: Fetching all partitions completed in 51 ms.

With batch size = 10
24/09/28 18:16:20 INFO Shim_v2_3: Fetching all partitions completed in 115 ms.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 3, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-49827][SQL] Fetching all partitions from hive metastore in bat… [SPARK-49827][SQL] Fetching all partitions from hive metastore in batches Oct 4, 2024
@Madhukar525722 Madhukar525722 force-pushed the HMS4 branch 2 times, most recently from 9ce516a to 42836d8 Compare October 4, 2024 06:00
@mridulm
Copy link
Contributor

mridulm commented Oct 5, 2024

+CC @shardulm94

@Madhukar525722
Copy link
Author

Gentle reminder @mridulm @pan3793 @HyukjinKwon @shardulm94

@Madhukar525722
Copy link
Author

Madhukar525722 commented Nov 1, 2024

Gentle ping @mridulm @pan3793 @HyukjinKwon @shardulm94 . Please review the change

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants