-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: allow custom caching via logical node #18688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c2deee7
e83038a
d9cd725
b17e5bb
8ae150f
13a8a3b
bc88780
ad7bea8
0af2832
244456f
13adf2a
ac86c20
8ce7aca
aaabd1f
d61021c
a1c04c9
1921919
55b8055
376453b
f95bc67
da949d5
cb0b43c
213c6f6
1c67bec
5d1c999
8a7f42b
5c3dfd9
e94b206
82815dd
747abb7
517d5bc
85ebd75
1c11fcc
73937e3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -61,7 +61,7 @@ use datafusion::prelude::{ | |||||||||||||
| }; | ||||||||||||||
| use datafusion::test_util::{ | ||||||||||||||
| parquet_test_data, populate_csv_partitions, register_aggregate_csv, test_table, | ||||||||||||||
| test_table_with_name, | ||||||||||||||
| test_table_with_cache_factory, test_table_with_name, | ||||||||||||||
| }; | ||||||||||||||
| use datafusion_catalog::TableProvider; | ||||||||||||||
| use datafusion_common::test_util::{batches_to_sort_string, batches_to_string}; | ||||||||||||||
|
|
@@ -2335,6 +2335,29 @@ async fn cache_test() -> Result<()> { | |||||||||||||
| Ok(()) | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| #[tokio::test] | ||||||||||||||
| async fn cache_producer_test() -> Result<()> { | ||||||||||||||
| let df = test_table_with_cache_factory() | ||||||||||||||
| .await? | ||||||||||||||
| .select_columns(&["c2", "c3"])? | ||||||||||||||
| .limit(0, Some(1))? | ||||||||||||||
| .with_column("sum", cast(col("c2") + col("c3"), DataType::Int64))?; | ||||||||||||||
|
|
||||||||||||||
| let cached_df = df.clone().cache().await?; | ||||||||||||||
|
|
||||||||||||||
| assert_snapshot!( | ||||||||||||||
| cached_df.clone().into_optimized_plan().unwrap(), | ||||||||||||||
| @r###" | ||||||||||||||
| CacheNode | ||||||||||||||
| Projection: aggregate_test_100.c2, aggregate_test_100.c3, CAST(CAST(aggregate_test_100.c2 AS Int64) + CAST(aggregate_test_100.c3 AS Int64) AS Int64) AS sum | ||||||||||||||
| Projection: aggregate_test_100.c2, aggregate_test_100.c3 | ||||||||||||||
| Limit: skip=0, fetch=1 | ||||||||||||||
| TableScan: aggregate_test_100, fetch=1 | ||||||||||||||
| "### | ||||||||||||||
| ); | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
to test the physical plan too
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont think we need to cover physical plan for this case, as it would be up to user to provide it
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It could be used as an example how to do it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps we can do it as a follow up in examples
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, once this PR is approved and merged, I can work on writing up an example.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks -- can you please file a ticket to track this work? |
||||||||||||||
| Ok(()) | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| #[tokio::test] | ||||||||||||||
| async fn partition_aware_union() -> Result<()> { | ||||||||||||||
| let left = test_table().await?.select_columns(&["c1", "c2"])?; | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many things that might be cached in the context of DataFusion (e.g. parquet metadata, etc)
It would be nice to mention in these comments that the API is used for
DataFrame::cacheto help users find what thtye are looking forThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, updated comment 👇 .