Skip to content

[Improvement] [AQE] Introduce new partition data distribution of global order #312

@zuston

Description

@zuston

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

In #293, it introduces the new partition data distribution abstraction. And for AQE skew optimization, it introduce the LOCAL ORDER mode to improve the validity of filtering data and speed up the reads.

But I think it may be not effective especially when shuffle-server's memory is too small that will cause fragmentation problems.

So basing on this assumption, I think maybe for some scenarios, global order of single partition data may be a better solution.

How should we improve?

Design

  1. Introduce the global order mode in current data distribution
  2. Do some performance benchmark for local order and global order

Global Order details

Shuffle server will know the shuffle reading start only when the first reader reaches. That means the sorting will be started in this time, and other readers will be hang until the sorting finish.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions