Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: recursive cte hangs on joins #9687

Merged
merged 3 commits into from
Mar 20, 2024
Merged

fix: recursive cte hangs on joins #9687

merged 3 commits into from
Mar 20, 2024

Conversation

jonahgao
Copy link
Member

@jonahgao jonahgao commented Mar 19, 2024

Which issue does this PR close?

Closes #9680.

Rationale for this change

CrossJoin uses OnceAsync to load the left table into memory and caches it into the plan for the duration of the entire query.
https://github.com/apache/arrow-datafusion/blob/b0b329ba39403b9e87156d6f9b8c5464dc6d2480/datafusion/physical-plan/src/joins/cross_join.rs#L61-L62

However, if the left table's data is derived from a CTE work table, the cached data may become outdated after each recursive iteration, because the work table may be changed. If we execute this plan again, there will be problems.

Therefore, those states in the plan, like cached data, need to be cleared whenever a new iteration begins.

What changes are included in this PR?

  • Reset the states in the plan before each new iteration begins in recursive CTE.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@jonahgao jonahgao marked this pull request as draft March 19, 2024 07:18
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 19, 2024
@jonahgao jonahgao marked this pull request as ready for review March 19, 2024 07:58
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonahgao (as always)

I agree the core problem is that the CrossJoinExec has shared state that is passed out to different calls to execute_stream and in general ExecutionPlans don't expect to have execute() called with the same partition number multiple times.

I think some potential downside of the approach in this PR are:

  1. It doesn't work for User defined ExecutionPlan nodes (without a change that they won't discover until someone tries to run a recursive CTE)
  2. It won't work for any new ExecutionPlan nodes introduced we add to DataFusion unless someone thinks to test them with recursive CTEs

Thus, what do you think about the idea of basically creating an entirely new ExecutionPlan via ExecutionPlan::with_new_children?

That would definitely be less efficient but it would be more general and ensure all state was cleared

@jonahgao
Copy link
Member Author

jonahgao commented Mar 20, 2024

Thus, what do you think about the idea of basically creating an entirely new ExecutionPlan via ExecutionPlan::with_new_children?

I've tried using with_new_children before, and I was concerned about its efficiency too. I also worried that it doesn't explicitly promise to clear the states in the plan, so I switched to the current solution. However, I couldn't decide which was better, as they both had their downsides.

Thank you for your advice @alamb , I agree that using ExecutionPlan::with_new_children would be better, it's safer and it does not introduce extra complexity from new API.

Update: Switch to using with_new_children has been finished.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonahgao

@alamb alamb merged commit b72d25c into apache:main Mar 20, 2024
25 checks passed
@jonahgao jonahgao deleted the cte_hang branch March 21, 2024 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query hangs on collecting stream from recursive CTE
2 participants