Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Move depcheck out of datafusion crate (200 less crates to compile) #9865

Merged
merged 2 commits into from
Mar 30, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 29, 2024

Which issue does this PR close?

Related to #9278, follow on to #9292

Rationale for this change

I want DataFusion to compile quickly and be a nice development experience, so the faster the compilation / fewer dependencies the better in general.

I have been wondering why compiling datafusion takes so long and requires compiling crates like gix (which is an implementation of git) #9844 (review) got me inspired to look into this:

...
   Compiling gix-attributes v0.20.1
   Compiling gix-credentials v0.22.0
   Compiling gix-ignore v0.9.1
...

It turns out the dependencies were needed by the depcheck test, added in #9292. There is no need to compile 200 extra crates everytime someone runs cargo test, I think it is enough if this tool runs as part of CI.

What changes are included in this PR?

move the depcheck test into its own binary (not part of the main workspace) and only run that as part of CI

Are these changes tested?

There is a new CI job and I manually tested that it catches issues

Manual Testing Details

Negative Test

I locally introduced a circular dependency:

diff --git a/datafusion/functions/Cargo.toml b/datafusion/functions/Cargo.toml
index 3ae306101..cde0742fc 100644
--- a/datafusion/functions/Cargo.toml
+++ b/datafusion/functions/Cargo.toml
@@ -83,6 +83,7 @@ uuid = { version = "1.7", features = ["v4"], optional = true }

 [dev-dependencies]
 criterion = "0.5"
+datafusion = { workspace = true }
 rand = { workspace = true }
 rstest = { workspace = true }
 tokio = { workspace = true, features = ["macros", "rt", "sync"] }

The test now fails

 andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2$ (cd dev/depcheck && cargo run)
    Compiling depcheck v0.0.0 (/Users/andrewlamb/Software/arrow-datafusion2/dev/depcheck)
     Finished dev [unoptimized + debuginfo] target(s) in 0.49s
      Running `target/debug/depcheck`
 Checking for circular dependencies in /Users/andrewlamb/Software/arrow-datafusion2/Cargo.toml
 thread 'main' panicked at src/main.rs:84:9:
 circular dependency detected from datafusion to self via one of {"datafusion-functions", "datafusion-execution", "datafusion-common-runtime", "datafusion-expr", "datafusion-common"}
 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

When I remove the local dependency, the test passes:

andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2$ (cd dev/depcheck && cargo run)
   Compiling depcheck v0.0.0 (/Users/andrewlamb/Software/arrow-datafusion2/dev/depcheck)
    Finished dev [unoptimized + debuginfo] target(s) in 0.50s
     Running `target/debug/depcheck`
Checking for circular dependencies in /Users/andrewlamb/Software/arrow-datafusion2/Cargo.toml
No circular dependencies found

Are there any user-facing changes?

Faster compilation times (200 less crates to compile!)

Previously, running cargo test required compiling 838 crates!

With this change we are down to 647 crates (still a crazy number)

It also seems to make the CI jobs significantly faster:

  1. before this PR: 11m4s
  2. after this PR: 915s

@github-actions github-actions bot added the core Core DataFusion crate label Mar 29, 2024
@alamb alamb added the development-process Related to development process of DataFusion label Mar 29, 2024
@alamb alamb force-pushed the alamb/move_dep_check branch from 935c53f to f61a9c2 Compare March 29, 2024 14:44
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Mar 29, 2024
@alamb alamb changed the title Minor: Move depcheck out of datafusion crate (200 less crates for compilation) Minor: Move depcheck out of datafusion crate (200 less crates to compile) Mar 29, 2024
@alamb alamb added the development-process Related to development process of DataFusion label Mar 29, 2024
@alamb alamb marked this pull request as ready for review March 29, 2024 15:14
@alamb
Copy link
Contributor Author

alamb commented Mar 29, 2024

@andygrove I wonder if you have any thoughts / reactions to running the circular dependency check as its own CI job?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb that looks like addressed to circular dependencies, but how it helps to improve the compile time?

@comphead
Copy link
Contributor

windows machine still slow as hell, and it is damned slow there because of compilation latency as well

@@ -130,7 +130,6 @@ zstd = { version = "0.13", optional = true, default-features = false }
[dev-dependencies]
async-trait = { workspace = true }
bigdecimal = { workspace = true }
cargo = "0.78.1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing this line, cargo test now requires 200 fewer crates to compile

@alamb
Copy link
Contributor Author

alamb commented Mar 30, 2024

Thanks @alamb that looks like addressed to circular dependencies, but how it helps to improve the compile time?

Sorry -- I should have made it clear, this line is what helps https://github.com/apache/arrow-datafusion/pull/9865/files#r1545339002

Note that the circular dependency check was already present, I just moved it out of the main workspace because the dependency requirements were massive

- name: Check dependencies
run: |
cd dev/depcheck
cargo run
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this check was previously covered by cargo test -p datafusion

/// checking the dependency graph.
///
/// See https://github.com/apache/arrow-datafusion/issues/9278 for more details
fn main() -> CargoResult<()> {
Copy link
Contributor Author

@alamb alamb Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note this check is not new, it just now runs in a separate binary which requires many fewer crates in the main workspace

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@github-actions github-actions bot removed the development-process Related to development process of DataFusion label Mar 30, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb, macos timing is good but we probably need to investigate why compile time is so slow on windows machines

@alamb
Copy link
Contributor Author

alamb commented Mar 30, 2024

lgtm thanks @alamb, macos timing is good but we probably need to investigate why compile time is so slow on windows machines

I agree the windows test takes too long.

On the other hand I think this PR actually improves the situation:

@alamb alamb merged commit 2159d82 into apache:main Mar 30, 2024
25 checks passed
@alamb
Copy link
Contributor Author

alamb commented Mar 30, 2024

Merging this in to improve the Dev experience. Thanks @comphead for the review

@alamb alamb deleted the alamb/move_dep_check branch March 30, 2024 20:17
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Apr 1, 2024
…ile) (apache#9865)

* Minor: Move depcheck out of main datafusion test

* Update dev/depcheck/README.md

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: comphead <comphead@users.noreply.github.com>
@andygrove
Copy link
Member

Thanks for catching this @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants