Skip to content

Refactor SortExec's buffered batches for better code readability #15372

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

Reference #15355 (comment) #15355 (comment)

The high-level execution logic of SortExec is described in https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/sorts/sort.rs, now a single field in_mem_batches is used to represent buffered data in different stage: during different time, it can be interpreted as either unordered input batch, or globally sorted batch:

in_mem_batches: Vec<RecordBatch>,

This approach has poor understandability and is also error-prone, see the idea in reference discussion for improvements.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions