Skip to content

Conversation

@gitmodimo
Copy link
Contributor

@gitmodimo gitmodimo commented Aug 21, 2025

Rationale for this change

Provide infrastructure for unified backpressure control.

What changes are included in this PR?

BackpressureController is moved from asof_join and sorted_merge to common backpressure.
BackpressureCombiner with BackpressureCombiner::Source is added to handle multiple pause sources.

Are these changes tested?

Test is added.

Are there any user-facing changes?

New api is available

@gitmodimo gitmodimo requested a review from westonpace as a code owner August 21, 2025 11:58
@github-actions
Copy link

⚠️ GitHub issue #47385 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #47385 has no components, please add labels for components.

@gitmodimo gitmodimo marked this pull request as draft August 21, 2025 19:10
@gitmodimo
Copy link
Contributor Author

I will simplify pause logic. It will be easier to reason with simple but composable pause propagation logic.

@gitmodimo gitmodimo marked this pull request as ready for review August 22, 2025 08:14
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 9, 2025
@zanmato1984
Copy link
Contributor

Some minor suggestions regarding to formats. Now looking into more details.

gitmodimo and others added 2 commits September 10, 2025 10:25
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
// 1. Default pause_on_any=true - pause on any source is propagated - OR logic
// 2. pause_on_any=false - pause is propagated only when all sources are paused - AND
// logic
class ARROW_ACERO_EXPORT BackpressureCombiner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get the idea of bridging multiple backpressure sources and propagating the backpressure between, in an implicit and automatic manner. But before we introduce this as a public API, I'd like to see a more concrete use case, e.g. a plan tree (possibly a hypothetical one with some new node types you are about to add in the future), to showcase some unclear things such as:

  1. Who owns the life span of the combiner and the combiner sources?
  2. When and where to establish the bridge between a combiner source and a particular combiner?
  3. When and where to trigger a combiner source's Pause() or Resume()?

Current BackpressureCombiner exposes too much implementation detail as a public interface, e.g. we even know it uses std::mutex to manage synchronization. But this is another topic that we may talk about later.

Copy link
Contributor Author

@gitmodimo gitmodimo Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is reasonable but I am not sure how to approach this. I was trying to keep the PRs relatively small. To showcase use of this component I would need Pipes that enable muli-output nodes. I can either add Pipe to this PR or update rebase Pipe draft to use that uses BackpressureCombiner internally. Also to think it would be relatively easy to showcase filter node with optional output providing filtered-out data. LMK what you think is best approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separation is good, you don't have to combine any PRs. I think you can just sketch the idea in text, if there isn't much trouble. (And I briefly viewed the Pipe PR and I get the idea.)

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants