Skip to content

Building project takes a *long* time (esp compilation time for datafusion core crate) #13814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #13813
alamb opened this issue Dec 17, 2024 · 32 comments
Open
Tracked by #13813
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Is your feature request related to a problem or challenge?

Compiling the datafusion crate currently takes 40 seconds on my machine, far longer than any other crate

This slows down CI builds as well as my own local development workflow

For example, running

# start from clean checking
rm -rf target
cargo build --timings

Generates a chart as follows (attached here): carg-timings.zip

Screenshot 2024-12-17 at 10 23 44 AM

Describe the solution you'd like

I would like to speed up compilation somehow -- likely by decreasing the time required for datafusion-core

Describe alternatives you've considered

I think the first thing would be to figure out if possible what is taking up so much time when building the core crate

I suspect it has to do with listing table / some of the various file format support, but I don't have data to justify that

Additional context

No response

@Omega359
Copy link
Contributor

install sccache and tell rust to use it

export RUSTC_WRAPPER=sccache
alias cargo="RUSTFLAGS='-Z threads=8' cargo +nightly"

With the above two options I see a full build on my machine taking 100.8s (1m 40.8s).

https://corrode.dev/blog/tips-for-faster-rust-compile-times/

@alamb
Copy link
Contributor Author

alamb commented Dec 17, 2024

With the above two options I see a full build on my machine taking 100.8s (1m 40.8s).

This makes sense -- thank you @Omega359

However, I think it will not help CI where the use of such caching doesn't help (and since the intermediate rust files are so large trying to use cargo cache wasn't effective in the past when we tried)

I would love to figure out how to break the datafusion core crate into smaller pieces / crates that can be compiled in parallel

@Omega359
Copy link
Contributor

fwiw I tried to gather a bit more info with llvm-lines:

cargo llvm-lines -p datafusion --lib | head -20

Lines                  Copies               Function name
  -----                  ------               -------------
  1220519                36986                (TOTAL)
    44221 (3.6%,  3.6%)    423 (1.1%,  1.1%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    25426 (2.1%,  5.7%)    265 (0.7%,  1.9%)  alloc::vec::Vec<T,A>::extend_desugared
    22436 (1.8%,  7.5%)    142 (0.4%,  2.2%)  <arrow_array::array::primitive_array::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    21263 (1.7%,  9.3%)     45 (0.1%,  2.4%)  <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::size_hint
    16910 (1.4%, 10.7%)    151 (0.4%,  2.8%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
    15949 (1.3%, 12.0%)    172 (0.5%,  3.2%)  alloc::vec::Vec<T,A>::extend_trusted
    12204 (1.0%, 13.0%)    116 (0.3%,  3.6%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    10585 (0.9%, 13.8%)     88 (0.2%,  3.8%)  <alloc::vec::into_iter::IntoIter<T,A> as core::iter::traits::iterator::Iterator>::try_fold
    10438 (0.9%, 14.7%)    295 (0.8%,  4.6%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
     9702 (0.8%, 15.5%)     53 (0.1%,  4.7%)  alloc::vec::in_place_collect::from_iter_in_place
     8645 (0.7%, 16.2%)     13 (0.0%,  4.8%)  std::io::default_read_to_end
     8256 (0.7%, 16.9%)    116 (0.3%,  5.1%)  core::iter::adapters::try_process
     7939 (0.7%, 17.5%)    356 (1.0%,  6.0%)  alloc::boxed::Box<T>::new
     7673 (0.6%, 18.2%)    236 (0.6%,  6.7%)  core::iter::adapters::map::map_fold::{{closure}}
     7375 (0.6%, 18.8%)      1 (0.0%,  6.7%)  datafusion::physical_planner::DefaultPhysicalPlanner::map_logical_node_to_physical::{{closure}}
     7352 (0.6%, 19.4%)    241 (0.7%,  7.3%)  tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
     7317 (0.6%, 20.0%)    102 (0.3%,  7.6%)  core::iter::traits::iterator::Iterator::try_fold

@findepi
Copy link
Member

findepi commented Dec 18, 2024

I would love to figure out how to break the datafusion core crate into smaller pieces / crates that can be compiled in parallel

yes! and move around sqlparser dependency when doing so :)

@tustvold
Copy link
Contributor

tustvold commented Dec 19, 2024

FWIW the only reliable mechanism I've found to measure this is to comment out modules and measure the impact on compilation time. Llvm-lines and cargo-bloat can be informative, but not all lines are created equal. This is especially true for code with complex lifetimes and/or async.

Anything making extensive use of macros is likely a good place to start

@alamb
Copy link
Contributor Author

alamb commented Mar 10, 2025

BTW I did some profiling on this today and it seems a non trivial amount of time is spent type checking / resolving functions somehow...

Image

@comphead
Copy link
Contributor

might be compiler related?

@tustvold
Copy link
Contributor

tustvold commented Mar 10, 2025

IIRC that relates to type checking expressions, and therefore this would suggest the compiler is spending a lot of time resolving generics. At least historically non-boxed async has been a good source of this, as async "desugars" to quite complex generic code, and often has quite complex Send / lifetime constraints.

Edit: rust-lang/rust#87012 is the related issue I was thinking of

@comphead
Copy link
Contributor

Thanks @tustvold it was nice attempt to solve it 3 years ago, hopefully someone is planning to close it out one day

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2025

One thing that is strange is that we have moved most of the code out of datafusion-core now.

I am still hoping we can get https://github.com/apache/datafusion/tree/main/datafusion/core/src/datasource/physical_plan out of the core somehow (I know @AdamGS is thinking about that)

However, I don't think that is the problem either as all the formts are feature gated and when I build with no default feaatures

When I build with no default features:

cargo build --timings -p datafusion --no-default-features

I get

Image

And then adding parquet (by far the largest / most complex), it goes up by only 1 seconds

cargo build --timings -p datafusion --no-default-features --features=parquet

Image

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2025

I also tried removing all the doctests from lib.rs and that didn't help (I didn't expect it to but I wanted to try)

diff --git a/datafusion/core/src/lib.rs b/datafusion/core/src/lib.rs
index b4d5f9740..9e1f9795e 100644
--- a/datafusion/core/src/lib.rs
+++ b/datafusion/core/src/lib.rs
@@ -824,282 +824,3 @@ pub mod test;

 mod schema_equivalence;
 pub mod test_util;
-
-#[cfg(doctest)]
-doc_comment::doctest!("../../../README.md", readme_example_test);
-
-// Instructions for Documentation Examples

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2025

I also tried cutting out (ablating) the contents of b/datafusion/core/src/physical_planner.rs but that diidn't seem to make any difference either

@logan-keede
Copy link
Contributor

using following command on release 42.0.0, 45.0.0, and current main.

cargo build -p datafusion-cli --timings -j 10

[42.0.0]
Image
Image
[45.0.0]
Image
Image
[46.0.0]
Image
Image

There is some improvement in build time (201sec -> 190 sec-->183sec), for datafusion crate(65->65->49).(some progress!!)

also, this article and the issue mentioned above by @tustvold(rust-lang/rust#87012) looks promising. The problem cause also seems to be similar between evaluate_obligation[1] and async_trait. Though I am not sure if we can apply the same tricks in our usecase. Is it technically possible[2] and worth a try?

[1]: I think evaluate_obligation and process_obligation are same, correct me if I am wrong.
[2]: I am guessing it should be possible but not too sure.

@alamb
Copy link
Contributor Author

alamb commented Mar 15, 2025

There is some improvement in build time (201sec -> 190 sec-->183sec), for datafusion crate(65->65->49).(some progress!!)

Pulling stuff out of core helped! Thank you !

I was wondering recently if the the probably could be related to all the re-exports (pub use ....) we do in DataFusion. Maybe we could see if reducing them helped 🤔

I think before we spend a lot of time engineering some solution we should try and do something quick and dirty to prove it will help. If it does then we can figure out how to make it work reasonably / get it in chunks

@findepi
Copy link
Member

findepi commented Mar 20, 2025

I was wondering recently if the the probably could be related to all the re-exports (pub use ....) we do in DataFusion. Maybe we could see if reducing them helped 🤔

it might be good idea for reasons other than compilation time.
For example, my IDE asking me which ScalarValue to import (datafusion::common::ScalarValue or datafusion::scalar::ScalarValue) isn't helpful at all. I would strongly prefer if a public entity had only one public path.

@comphead
Copy link
Contributor

New Rust compiler has notable performance changes, https://nnethercote.github.io/2025/03/19/how-to-speed-up-the-rust-compiler-in-march-2025.html

looking fwd

@alamb
Copy link
Contributor Author

alamb commented Mar 31, 2025

I ran some profiling recently and it seems like catalog is also getting pretty long to compile

Image

(this is before #15459)

@alamb
Copy link
Contributor Author

alamb commented Mar 31, 2025

I wonder since catalog is a smaller crate, we could more easily figure out what was taking time to compile...

@logan-keede
Copy link
Contributor

I did some profiling.

RUSTC_BOOTSTRAP=1 cargo rustc -p datafusion-catalog -- -Z self-profile -Z self-profile-events=default,args

Image

Image

culprits are some asynchronous functions(Memtable::{scan, insert_into}, StreamTable::{scan, insert_into} StreamTableFactory::create, StreamWrite::write_all, ViewTable::scan ):- Each of them taking 1-2 sec(5-10% of total time) in analysis phase only.

there are a couple hundred evaluate_obligation call under each type_check of pattern
{...{some_datafusion_object} as std::marker::Sync/Send> ... } which means that compiler is trying to check if some_datafusion_object fulfils the trait bound of being Send/Sync.

This does not happen with other asynchronous method eg. ListingSchemaProvider::refresh, so maybe we can try to check the difference between them and see what is going wrong with above mentioned functions.

@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2025

A break in the case! 🕵

@comphead
Copy link
Contributor

comphead commented Apr 1, 2025

default

Thanks @logan-keede btw is it a release build? can you also check with LTO and non LTO, sometimes this link optimizations takes time as well

UPD: What rustc version used?

@logan-keede
Copy link
Contributor

logan-keede commented Apr 1, 2025

Thanks @logan-keede btw is it a release build? can you also check with LTO and non LTO, sometimes this link optimizations takes time as well

UPD: What rustc version used?

It is debug build, I will try to check it with LTO and non LTO sometime soon.

rustc version:- 1.84.1

@comphead
Copy link
Contributor

comphead commented Apr 1, 2025

rustc version:- 1.84.1

1.86 introduced some compilation improvements https://releases.rs/docs/1.86.0/#internal-changes

@alamb
Copy link
Contributor Author

alamb commented Apr 1, 2025

It seems like 1.86 won't be released for 3 more days: https://releases.rs/docs/1.86.0/

It would be cool to update and try it

@jayzhan211
Copy link
Contributor

I did some profiling.

RUSTC_BOOTSTRAP=1 cargo rustc -p datafusion-catalog -- -Z self-profile -Z self-profile-events=default,args
Image

Image

culprits are some asynchronous functions(Memtable::{scan, insert_into}, StreamTable::{scan, insert_into} StreamTableFactory::create, StreamWrite::write_all, ViewTable::scan ):- Each of them taking 1-2 sec(5-10% of total time) in analysis phase only.

there are a couple hundred evaluate_obligation call under each type_check of pattern {...{some_datafusion_object} as std::marker::Sync/Send> ... } which means that compiler is trying to check if some_datafusion_object fulfils the trait bound of being Send/Sync.

This does not happen with other asynchronous method eg. ListingSchemaProvider::refresh, so maybe we can try to check the difference between them and see what is going wrong with above mentioned functions.

ListingSchemaProvider::refresh is async method in the struct while scan and insert_into are async trait method.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 3, 2025

rust-lang/rust#87012 (comment)

This thread mentioned a similar issue to what we have.
What we can do is to find a way to avoid async method at all or remove the Send bound

#13814 (comment) has the conclusion already 😆

@logan-keede
Copy link
Contributor

rust-lang/rust#87012 (comment)

This thread mentioned a similar issue to what we have. What we can do is to find a way to avoid async method at all or remove the Send bound

#13814 (comment) has the conclusion already 😆

still good to get a heads up

@jayzhan211
Copy link
Contributor

ExecutionPlan's execution is changed to sync in #2307. It returns SendableRecordBatchStream instead.

Is it possible to change scan and insert_into to sync and returns Stream?

@logan-keede
Copy link
Contributor

Thanks @logan-keede btw is it a release build? can you also check with LTO and non LTO, sometimes this link optimizations takes time as well

UPD: What rustc version used?

changing to release build lto or non-lto did not make much of a difference from debug build except the extra process that seems to be taking some time.

rustc 1.86.0 + release build
Image

changing to rustc 1.86 did reduce the build time from 18sec -> 14 sec and then to 12 sec when I made insert_into sync (only on catalog level so not mergeable)

rustc 1.86.0 + debug build
Image

insert_into sync + rustc 1.86.0 + debug build
Image

@logan-keede
Copy link
Contributor

FYI @logan-keede -- maybe we can see how much this helps the build speed

Originally posted by @alamb in #15625 (comment)

It actually does not seem to have much impact on CI.
after some digging, or rather building with different version of rustc I observed that build time for 1.85.1 and 1.86.0 are similar but better than that of 1.84.1(the version I was using during initial profiling).
for reference, datafusion-catalog takes ~50 sec if rustc>1.85 but >60 sec if rustc ==1.84.1 on my laptop

TLDR: Improvements I observed were actually introduced in 1.85 and we were already benefiting from them. Sorry for the false positive.

@comphead
Copy link
Contributor

comphead commented Apr 17, 2025

Interesting material comes up this week
https://www.feldera.com/blog/cutting-down-rust-compile-times-from-30-to-2-minutes-with-one-thousand-crates
And the nature of this project is similar to DataFusion

UPD: These guys proposing something we already started, namely split large modules into smaller ones which allows compile parallelization

@alamb
Copy link
Contributor Author

alamb commented Apr 17, 2025

Interesting material comes up this week https://www.feldera.com/blog/cutting-down-rust-compile-times-from-30-to-2-minutes-with-one-thousand-crates And the nature of this project is similar to DataFusion

UPD: These guys proposing something we already started, namely split large modules into smaller ones which allows compile parallelization

Interesting! Though that is via codegen (they are code gening rust so can make thousands of modules)

I also found it fascainting they use DataFusion too!

https://github.com/feldera/feldera/blob/7550dc4a024f41020d98233885705d733d14a7e8/Cargo.toml#L73

🐶 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants