Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create new datafusion-execution crate, start splitting code out #5432

Merged
merged 3 commits into from
Mar 1, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 28, 2023

Which issue does this PR close?

Part of splitting out physical plan: #1754
First part of #5405

Rationale for this change

the datafusion crate is quite large as it has all the physical plan code as well as the physical optimizer

I would like to split out physical_plan and physical_optimizer (to mimic datafusion_expr and datafusion_optimizer).

However in order to do so I need to move parts of datafusion that they depend on to a new module

What changes are included in this PR?

Changes:

  1. create a new datafusion-execution crate
  2. Extract some (but not all) of datafusion/core/src/execution to that new crate

Items remaining (planned as follow on PRs):

  1. Figure out how to decouple ListingTableFactory and RuntimeEnv (I have an idea, but will easier to review in follow on PRs)
  2. Extract structures from context.rs (specifically SessionState and then the things that it depends on)

I plan to leave SessionContext in the core datafusion crate as it pulls everything together (relies on everything)

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

There will be a new crate, but otherwise no

more disk_manager

move registry
@alamb alamb changed the title create new datafusion-execution crate, start splitting it out create new datafusion-execution crate, start splitting code out Feb 28, 2023
pub mod runtime_env;

// backwards compatibility
pub use datafusion_execution::disk_manager;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved three modules in this first PR -- runtime_env will take a little more work as will context so I plan to do them as smaller follow on PRs

@@ -310,6 +310,7 @@ dot -Tsvg dev/release/crate-deps.dot > dev/release/crate-deps.svg
(cd datafusion/row && cargo publish)
(cd datafusion/physical-expr && cargo publish)
(cd datafusion/optimizer && cargo publish)
(cd datafusion/execution && cargo publish)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the biggest change (a new crate to publish) -- cc @andygrove

@alamb alamb marked this pull request as draft February 28, 2023 22:32
@github-actions github-actions bot added the core Core DataFusion crate label Feb 28, 2023
@alamb alamb marked this pull request as ready for review February 28, 2023 22:47
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea of split, which make great sense to me.
Thanks @alamb

@alamb alamb merged commit 17b2f11 into apache:main Mar 1, 2023
@ursabot
Copy link

ursabot commented Mar 1, 2023

Benchmark runs are scheduled for baseline = f9f40bf and contender = 17b2f11. 17b2f11 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb deleted the alamb/split_out_context branch March 2, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants