Skip to content

Limit the max merge degree during re-spill in external sort #16908

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

This is a follow-up to: #15700
Part of #15271

#15700 introduces a comprehensive approach for external sorts: if it's not possible to do the merge for all spills in one pass, it will merge as many spills as possible, do a re-spill, and iterate until we can produce the final output at once.
Note it is also use estimations for the memory accounting, under extreme edge cases this estimation can be inaccurate, causing the actual memory usage larger than the configured memory limit.

To prevent such unintentional cases and to enable performance tuning (merging a smaller number of batches at once is more cache-friendly/faster), we can introduce a additional configuration option:
max_merge_degree: Maximum number of streams to merge during re-spills in external sorts

The original context: #15700 (comment)

Describe the solution you'd like

This previous attempt can be used as the reference for the solution: #15610
It has to be re-structured to build upon #15700

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions