-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
This is a follow-up to: #15700
Part of #15271
#15700 introduces a comprehensive approach for external sorts: if it's not possible to do the merge for all spills in one pass, it will merge as many spills as possible, do a re-spill, and iterate until we can produce the final output at once.
Note it is also use estimations for the memory accounting, under extreme edge cases this estimation can be inaccurate, causing the actual memory usage larger than the configured memory limit.
To prevent such unintentional cases and to enable performance tuning (merging a smaller number of batches at once is more cache-friendly/faster), we can introduce a additional configuration option:
max_merge_degree: Maximum number of streams to merge during re-spills in external sorts
The original context: #15700 (comment)
Describe the solution you'd like
This previous attempt can be used as the reference for the solution: #15610
It has to be re-structured to build upon #15700
Describe alternatives you've considered
No response
Additional context
No response