Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Handle pipeline breakers through avoiding reuse #873

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented Feb 14, 2024

This is a naive implementation that recomputes pipeline breakers (#854) to avoid OOM errors. This PR has a list of deficiencies, namely:

  • We always go back to IO Nodes, anything that shuffles before we run into a reduction should be fine
  • We should track references on the Expression, this would mean that the pipeline breaker modifications would be a simple "_simplify_down`` step. We are messing with nodes deep down in the tree at the moment which means that their dependents aren't updated properly. That's the main reason that it currently is a separate optimization step
  • Not all reductions are supported
  • I think we have to lower merges into different expressions to make this work properly

Just putting this up here for others to potentially mess around with. This is most helpful for query 21 in tpch benchmarks. The query is super slow without dask/dask#10922 though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant