-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive CTEs: Stage 3 - add execution support #8840
Recursive CTEs: Stage 3 - add execution support #8840
Conversation
time, | ||
1 as "val" | ||
FROM | ||
(SELECT DISTINCT "time" FROM "beg_account_balance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails. But the following changes cause the test to pass
- removing time or 1 as "val" from the sub_cte projection
- removing the reference to beg_account_balance in the FROM clause (effectively removing the sub-query from the sub_cte)
- removing the FULL JOIN "sub_cte" on 1=1
The test fails with index out of bounds: the len is 0 but the index is 0
but other incarnations of the error after playing around with the repro were
Internal error: PhysicalExpr Column references column 'time' at index 0 (zero-based) but input schema only has 0 columns: [].
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
[SQL] WITH RECURSIVE "recursive_cte" AS
External error: query failed: DataFusion error: ProjectionPushdown
caused by
Internal error: PhysicalExpr Column references column 'time' at index 0 (zero-based) but input schema only has 0 columns: [].
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
[SQL] WITH RECURSIVE "recursive_cte" AS (
External error: query failed: DataFusion error: ProjectionPushdown
caused by
Internal error: PhysicalExpr Column references column 'LEAD(beg_account_balance.time,Int64(1)) ORDER BY [beg_account_balance.time ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' at index 1 (zero-based) but input schema only has 1 columns: ["LEAD(beg_account_balance.time,Int64(1)) ORDER BY [beg_account_balance.time ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"].
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
[SQL] WITH RECURSIVE "recursive_cte" AS (
Note that the columns referenced in the error messages are different from the repro because the repro at the time that I collected those errors was different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like an empty schema is being used somewhere (len is zero). Maybe there is a bug in the output schema calculations or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing projection pushdown fixes the issue. https://github.com/matthewgapp/arrow-datafusion/blob/362bec93736f64aef7327f0ed577a29b75e74796/datafusion/core/src/physical_optimizer/optimizer.rs#L122. Looking into file changes that might be relevant...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a quick pass at PRs that touched projection_pushdown.rs
, I'm still stuck. Here's the log of my investigation. Guess I'll have to buckle up and start debugging 🙃
Tried to revert https://github.com/apache/arrow-datafusion/pull/8662/files but couldn’t get code to compile so gave up which is a shame since that PR looks the most relevant
Reverted #8573 but didn’t fix
Reverted #8454 but didn’t fix
Reverted #8485 but didn’t fix
Didn’t try to revert https://github.com/apache/arrow-datafusion/pull/8327/files because it doesn’t look relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed min repro in this commit ea4cb6e. But still breaking in our downstream product so I'll add the fuller test case and work to fix against that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't seem to reproduce outside of our project just yet. Getting the following panic
The application panicked (crashed).
Message: primitive array
Location: /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-49.0.0/src/cast.rs:751
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⋮ 15 frames hidden ⋮
16: core::panicking::panic_display::hbefe501d317ed1d7
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:196
17: core::panicking::panic_str::h9863c42b2d46e0f3
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:171
18: core::option::expect_failed::h4e9eb510dd8145dd
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/option.rs:1980
19: core::option::Option<T>::expect::hec9b720ec03fc4f9
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/option.rs:894
20: arrow_array::cast::AsArray::as_primitive::h493f40e8665dba69
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-49.0.0/src/cast.rs:751
749 │ /// Downcast this to a [`PrimitiveArray`] panicking if not possible
750 │ fn as_primitive<T: ArrowPrimitiveType>(&self) -> &PrimitiveArray<T> {
751 > self.as_primitive_opt().expect("primitive array")
752 │ }
753 │
21: <datafusion_physical_plan::aggregates::group_values::primitive::GroupValuesPrimitive<T> as datafusion_physical_plan::aggregates::group_values::GroupValues>::intern::h6edcb0cb053f37df
at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/group_values/primitive.rs:116
114 │ groups.clear();
115 │
116 > for v in cols[0].as_primitive::<T>() {
117 │ let group_id = match v {
118 │ None => *self.null_group.get_or_insert_with(|| {
22: datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream::group_aggregate_batch::h941a0f4d623d2cc4
at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs:548
546 │ // calculate the group indices for each input row
547 │ let starting_num_groups = self.group_values.len();
548 > self.group_values
549 │ .intern(group_values, &mut self.current_group_indices)?;
550 │ let group_indices = &self.current_group_indices;
23: <datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream as futures_core::stream::Stream>::poll_next::h778e63c29afa7d81
at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs:438
436 │
437 │ // Do the grouping
438 > extract_ok!(self.group_aggregate_batch(batch));
439 │
440 │ // If we can begin emitting rows, do so,
24: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next::h13c49d63ee92c868
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.29/src/stream.rs:120
118 │
119 │ fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
120 > self.get_mut().as_mut().poll_next(cx)
121 │ }
122 │
25: futures_util::stream::stream::StreamExt::poll_next_unpin::h5f167c57e87b8a15
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.29/src/stream/stream/mod.rs:1638
1636 │ Self: Unpin,
1637 │ {
1638 > Pin::new(self).poll_next(cx)
1639 │ }
1640 │
26: <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll::h558ec691ab4e7eeb
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.29/src/stream/stream/next.rs:32
30 │
31 │ fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
32 > self.stream.poll_next_unpin(cx)
33 │ }
34 │ }
27: datafusion_physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}::h0d5b828859fa1f18
at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/repartition/mod.rs:702
700 │ // fetch the next batch
701 │ let timer = metrics.fetch_time.timer();
702 > let result = stream.next().await;
703 │ timer.done();
704 │
28: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h24aa3949553e8510
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/core.rs:328
326 │
327 │ let _guard = TaskIdGuard::enter(self.task_id);
328 > future.poll(&mut cx)
329 │ })
330 │ };
29: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hed0e8a2b03eb44a5
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/loom/std/unsafe_cell.rs:16
14 │ #[inline(always)]
15 │ pub(crate) fn with_mut<R>(&self, f: impl FnOnce(*mut T) -> R) -> R {
16 > f(self.0.get())
17 │ }
18 │ }
30: tokio::runtime::task::core::Core<T,S>::poll::h6e6c537ecabf42b1
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/core.rs:317
315 │ pub(super) fn poll(&self, mut cx: Context<'_>) -> Poll<T::Output> {
316 │ let res = {
317 > self.stage.stage.with_mut(|ptr| {
318 │ // Safety: The caller ensures mutual exclusion to the field.
319 │ let future = match unsafe { &mut *ptr } {
31: tokio::runtime::task::harness::poll_future::{{closure}}::hdaa68138fe576c41
at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/harness.rs:485
483 │ }
484 │ let guard = Guard { core };
485 > let res = guard.core.poll(cx);
486 │ mem::forget(guard);
487 │ res
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue seems to arise from (or at lease be correlated to) the MemoryExec. I think this because the difference in MemExec partition sizes (more on that below) and because when I remove try_swapping_with_memory
https://github.com/matthewgapp/arrow-datafusion/blob/ea4cb6e9c6b8a0bc892ef76578fd7853ddc692ab/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L110, our project no longer blows up.
The difference between the plans that fail (in our project) and succeed (outside of our project) is that the offending Memory exec has a partition size of 2 ( MemoryExec: partitions=1, partition_sizes=[2]
) and the successful plan has a partition size of 1 ( MemoryExec: partitions=1, partition_sizes=[1]
).
I'm not sure how to force datafusion to take on a larger partition size so that I can reproduce outside of our project 🤔
Offending physical plan from logs:
Input physical plan:
AggregateExec: mode=FinalPartitioned, gby=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement], aggr=[]
AggregateExec: mode=Partial, gby=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement], aggr=[]
ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement]
RecursiveQueryExec: is_distinct=false
ProjectionExec: expr=[40 as emd, 0 as beg, prices_row_num@5 as prices_row_num, prices_row_num_advancement@6 as prices_row_num_advancement]
NestedLoopJoinExec: join_type=Left
NestedLoopJoinExec: join_type=Left
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(prices_row_num@3, prices_row_num@0)]
MemoryExec: partitions=1, partition_sizes=[2]
AggregateExec: mode=FinalPartitioned, gby=[prices_row_num@0 as prices_row_num], aggr=[]
AggregateExec: mode=Partial, gby=[prices_row_num@0 as prices_row_num], aggr=[]
ProjectionExec: expr=[prices_row_num@0 as prices_row_num]
ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num]
AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
MemoryExec: partitions=1, partition_sizes=[2]
ProjectionExec: expr=[prices_row_num@0 as prices_row_num]
ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num]
AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
MemoryExec: partitions=1, partition_sizes=[2]
ProjectionExec: expr=[prices_row_num_advancement@0 as prices_row_num_advancement]
ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num_advancement]
AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
NestedLoopJoinExec: join_type=Inner, filter=prices_row_num@0 > prices_row_num@1
MemoryExec: partitions=1, partition_sizes=[2]
ProjectionExec: expr=[prices_row_num@0 as prices_row_num]
ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num]
AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)]
MemoryExec: partitions=1, partition_sizes=[2]
ProjectionExec: expr=[emd@0 + 40 as emd, emd@0 as beg, prices_row_num@8 as prices_row_num, prices_row_num_advancement@9 as prices_row_num_advancement]
ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement, Index@4 as Index, product@5 as product, price@6 as price, prices_row_num@7 as prices_row_num, prices_row_num@9 as prices_row_num, prices_row_num_advancement@10 as prices_row_num_advancement]
HashJoinExec: mode=Partitioned, join_type=Left, on=[(coalesce(prices_with_row_num_2.prices_row_num,recursive_cte.prices_row_num_advancement)@8, prices_row_num@0)]
ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement, Index@4 as Index, product@5 as product, price@6 as price, prices_row_num@7 as prices_row_num, coalesce(prices_row_num@7, prices_row_num_advancement@3) as coalesce(prices_with_row_num_2.prices_row_num,recursive_cte.prices_row_num_advancement)]
HashJoinExec: mode=Partitioned, join_type=Left, on=[(prices_row_num_advancement@3, prices_row_num@3)]
FilterExec: prices_row_num_advancement@3 IS NOT NULL
WorkTableExec: name=recursive_cte
MemoryExec: partitions=1, partition_sizes=[2]
ProjectionExec: expr=[prices_row_num@0 as prices_row_num, LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@1 as prices_row_num_advancement]
BoundedWindowAggExec: wdw=[LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Ok(Field { name: "LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(UInt64(NULL)), end_bound: CurrentRow }], mode=[Sorted]
AggregateExec: mode=FinalPartitioned, gby=[prices_row_num@0 as prices_row_num], aggr=[]
AggregateExec: mode=Partial, gby=[prices_row_num@0 as prices_row_num], aggr=[]
ProjectionExec: expr=[prices_row_num@3 as prices_row_num]
MemoryExec: partitions=1, partition_sizes=[2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to increase the partition size to 2 within the memory exec by setting the batch size to 50 on the session config, but I'm still unable to reproduce in my test crate outside of our larger project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to reproduce and I think I'm getting to the bottom of this issue, which indeed appears to be a separate issue. It arises when creating an upstream table with ROW_NUMBER() window function. Some sort of subtle data type mismatch causes downstream execution to blow up. We can workaround this by wrapping the upstream ROW_NUMBER() column in a CAST to BIGINT.
}) | ||
.collect() | ||
) -> Option<Vec<usize>> { | ||
if source.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to fixing panic which caused a unit test in cte.slt to fail. More info here #8840 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that this bug might not have been fixed.
Returning None
seems to select all the columns of the csv table balance
.
DataFusion CLI v34.0.0
❯ CREATE EXTERNAL TABLE balance STORED as CSV WITH HEADER ROW LOCATION '../testing/data/csv/r_cte_balance.csv';
0 rows in set. Query took 0.026 seconds.
❯ set datafusion.optimizer.max_passes=0;
0 rows in set. Query took 0.002 seconds.
❯ select time from balance;
ProjectionPushdown
caused by
Internal error: PhysicalOptimizer rule 'ProjectionPushdown' failed, due to generate a different schema,
original schema: Schema { fields: [
Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} },
new schema: Schema { fields:
Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "account_balance", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a993efb
to
f67763a
Compare
a7fd8bc
to
8c8a1d4
Compare
06fcbea
to
68c9fe7
Compare
The tests are failing because it's failing to find the commit that contains the new test files. I pushed the new commit to my branch here so it should start passing again when they're re-run apache/arrow-testing#93 |
68c9fe7
to
b11ca8b
Compare
FYI @jonahgao I'll try and find time to review this over the next day or two. Thanks @matthewgapp |
Thank you, looking forward to getting this over the finish line! |
|
||
/// Take the previously written batches from the work table. | ||
/// This will be called by the [`WorkTableExec`] when it is executed. | ||
fn take(&self) -> Vec<RecordBatch> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work table might be read multiple times if there are multiple TableScans on it in the recursive term.
WITH RECURSIVE my_cte AS (
SELECT 1 as a
UNION ALL
SELECT my_cte.a+2 as a
FROM my_cte join my_cte c2 using(a)
WHERE my_cte.a<5
)
SELECT a FROM my_cte;
We might need to clone the batches here, or we may not support it, like PostgreSQL does.
postgres=# WITH RECURSIVE my_cte AS (
SELECT 1 as a
UNION ALL
SELECT my_cte.a+2 as a
FROM my_cte join my_cte c2 using(a)
WHERE my_cte.a<5
)
SELECT a FROM my_cte;
ERROR: recursive reference to query "my_cte" must not appear more than once
LINE 5: FROM my_cte join my_cte c2 using(a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not supporting it for now would be fine (rather than paying the clone tax for every query regardless of whether it's referenced multiple times)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could make the Worktable
capable of being read repeatedly. For example, we could use something like Arc<Vec<RecordBatch>>
. However, I think it's okay not to support it; we can refine it in subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense. Off the top of your head, is there an easy way to traverse the recursive term of the logical plan to count the number of table references (relations) that match the cte name? I'll avoid building that machinery myself if it already exists.
Btw, looks like memory stream clones record batches anyway. https://github.com/matthewgapp/arrow-datafusion/blob/b11ca8bb7e61d78af831b8667fa551d26e4839c7/datafusion/physical-plan/src/memory.rs#L255. Feels like the poll_next implementation should instead drain the batches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about implementing it in assign_work_table
as well?
fn assign_work_table(
plan: Arc<dyn ExecutionPlan>,
work_table: Arc<WorkTable>,
) -> Result<Arc<dyn ExecutionPlan>> {
let mut work_table_refs = 0;
plan.transform_down_mut(&mut |plan| {
if let Some(exec) = plan.as_any().downcast_ref::<WorkTableExec>() {
if work_table_refs > 0 {
not_impl_err!(
"Multiple recursive references to the same CTE are not supported"
)
} else {
work_table_refs += 1;
Ok(Transformed::Yes(Arc::new(
exec.with_work_table(work_table.clone()),
)))
}
} else if plan.as_any().is::<RecursiveQueryExec>() {
not_impl_err!("Recursive queries cannot be nested")
} else {
Ok(Transformed::No(plan))
}
})
}
Yes, cloning in MemoryStream
seems to be avoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented here 2eaa48a
(#8840), thanks @jonahgao!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a separate PR to remove the unnecessary clones from memory stream.
I plan to review this tomorrow |
Thank you! |
Except for the changes related to Thank you! @matthewgapp |
add config flag for recursive ctes update docs from script update slt test for doc change restore testing pin add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan fix sql integration test wip: add sql test for logical plan wip: format test assertion wip: remove uncessary with qualifier method some docs more docs Add comments to `RecursiveQuery` Update datfusion-cli Cargo.lock Fix clippy better errors and comments add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan fix sql integration test wip: add sql test for logical plan wip: format test assertion wip: remove uncessary with qualifier method some docs more docs impl execution support add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan partway through porting over isidentical's work Continuing implementation with fixes and improvements Lint fixes ensure that repartitions are not added immediately after RecursiveExec in the physical-plan add trivial sqllogictest more recursive tests remove test that asserts recursive cte should fail additional cte test wip: remove tokio from physical plan dev deps format cargo tomls fix issue where CTE could not be referenced more than 1 time wip: fixes after rebase but tpcds_physical_q54 keeps overflowing its stack Impl NamedRelation as CteWorkTable * impl cte as work table * move SharedState to continuance * impl WorkTableState * upd * assign work table state * upd * upd fix min repro but still broken on larger test case set config in sql logic tests clean up cte slt tests fixes fix option add group by test case and more test case files wip add window function recursive cte example simplify stream impl for recrusive query stream add explain to trivial test case move setting of recursive ctes to slt file and add test to ensure multiple record batches are produced each iteration remove tokio dep and remove mut lint, comments and remove tokio stream update submodule pin to feat branch that contains csvs update submodule pin to feat branch that contains csvs
e65caee
to
80069f7
Compare
I rebased onto main which has the projection pushdown fix #9005 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @matthewgapp -- I reviewed this PR and it looks really nice (thank you @jonahgao for all the feedback that left it in such good shape)
The only thing I think is needed prior to merge is remove the arrow-testing pin -- the testing data really belong in the https://github.com/apache/arrow-testing repo, as that is shared across arrow implementations. Perhaps you could put it somewhere in the main repo, such as in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/data ?
Otherwise I think this PR is looking great
|
||
# setup | ||
statement ok | ||
CREATE EXTERNAL TABLE balance STORED as CSV WITH HEADER ROW LOCATION '../../testing/data/csv/r_cte_balance.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW if you wanted to avoid having to check in external files, you could use CREATE TABLE AS VALUES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I'll leave for files for now - since some of the files have a non-trivial number of values.
@@ -84,7 +82,11 @@ impl TableProvider for CteWorkTable { | |||
_filters: &[Expr], | |||
_limit: Option<usize>, | |||
) -> Result<Arc<dyn ExecutionPlan>> { | |||
not_impl_err!("scan not implemented for CteWorkTable yet") | |||
// TODO: pushdown filters and limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend tracking this separate tickets or tasks on the main recursive CTE ticket, otherwise things like this can be easily forgotten
Partitioning::UnknownPartitioning(1) | ||
} | ||
|
||
// TODO: control these hints and see whether we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this would be pretty tricky -- I think the choices you have below are the simple (and correct) ones, and excellent choices for the first implementation 👍
cx: &mut Context<'_>, | ||
) -> Poll<Option<Self::Item>> { | ||
// TODO: we should use this poll to record some metrics! | ||
if let Some(static_stream) = &mut self.static_stream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very nicely written and easy to follow 👍
@alamb thanks so much for the review. I'll move the test files and remove the pin now. |
@alamb I removed the pin and updated to reflect your tweaks. Should be good to go :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic work @matthewgapp
And thank you so much for all your help @jonahgao
I can't wait to get this hardned enough to turn on my default ❤️ |
Filed #9554 to track enabling this feature by default |
Adds execution support to recursive CTEs based on a worktable physical plan that mirrors the previous iteration's results back into the recursive query's execution plan.
Part of #462
Todos
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?