Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression between cc 1.0.90 and 1.0.91, regression is also present in 1.0.96 #1048

Closed
AlexanderSchuetz97 opened this issue May 3, 2024 · 11 comments

Comments

@AlexanderSchuetz97
Copy link

AlexanderSchuetz97 commented May 3, 2024

Hello, I am noticing a massive decrease in performance of the CC dependency when updating to version 1.0.91.

When used with mingw to build a windows shared library the decrease in performance is about factor 2. It used to take 30 seconds to build my project now it takes 60 seconds.

To build a linux shared library I use cargo zigbuild. The difference is about factor 7 with zigbuild,
It used to take about 50 seconds, now it takes well above 7 minutes. Nothing else has changed in my project other than the cc version.
Between each test I ran cargo clean to ensure that it does a clean build. I think something is up here.

My project features a large percentage of C code. (About 80% is C code, the rest is rust)
I dont have any C++ in my project.

I would not have reported this issue here if it was just zigbuild that was affected, even the normal mingw toolchain is affected.

Does anyone have any idea on significant changes betwen those versions that could affect performance?
The only observable difference in terms of log output appears to be that warnings about gcc version mismatch that used to be there are now missing when building the windows library. (Which is positive I guess... but not worth the horrible performance)

These are my dependencies and cargo configuration:

[package]
name = "XXXXXXXXXXXX"
version = "0.1.0"
edition = "2021"

[lib]
name = "XXXXXXXXXXXXXXXXX"
crate-type = ["cdylib"]

[dependencies]
jni = "0.21.1"
half = "2.4.0"
bytemuck = "1.15.0"
region = "3.0.1"
lazy_static = "1.4"

[build-dependencies]
cc = "1.0.90"

[profile.release]
strip = true
opt-level = 3
codegen-units = 1
lto = "fat"

This is my build.rs, the only thing configured here is the cc crate

use cc;

fn main() {
    let src = &[
        //lots of stuff Stuff goes here
    ];
    cc::Build::new()
        .files(src)
        .include("stuff goes here")
        .define("__ANSI__", None)
        .flag_if_supported("-Wno-constant-conversion")
        .flag_if_supported("-Wno-unused-const-variable")
        .flag_if_supported("-Wno-deprecated-declarations")
        .flag_if_supported("-Wno-comment")
        .flag_if_supported("-Wno-unused-value")
        .flag_if_supported("-Wno-unused-function")
        .flag_if_supported("-Wno-unknown-pragmas")
        .flag_if_supported("-Wno-extra-tokens")
        .flag_if_supported("-Wno-missing-field-initializers")
        .flag_if_supported("-Wno-shift-negative-value")
        .flag_if_supported("-Wno-dangling-else")
        .flag_if_supported("-Wno-sign-compare")
        .flag_if_supported("-Wno-strict-aliasing")
        .flag_if_supported("-Wno-implicit-fallthrough")
        .flag_if_supported("-Wno-old-style-declaration")
        .flag_if_supported("-Wno-endif-labels")
        .flag_if_supported("-Wno-parentheses")
        .flag_if_supported("-Wno-misleading-indentation")
        .flag_if_supported("-Wno-unused-but-set-variable")
        .opt_level(3)
        .compile("mylib");
}

As mentioned the only change needed to tank into oblivion the compilation speed is to update the cc crate.
This entire build is done on windows 11.

@AlexanderSchuetz97 AlexanderSchuetz97 changed the title Performance regression between cc 1.0.90 and 1.0.91, regression is also present in 1.9.96 Performance regression between cc 1.0.90 and 1.0.91, regression is also present in 1.0.96 May 3, 2024
@dpaoliello
Copy link
Contributor

Given the number of calls to flag_if_supported, it would guess that be62f4a is the culprit.

@dpaoliello
Copy link
Contributor

@AlexanderSchuetz97 can you please try https://github.com/dpaoliello/cc-rs/tree/perf?

[patch.crates-io]
cc = { git = "https://github.com/dpaoliello/cc-rs.git", branch = "perf" }

@NobodyXu
Copy link
Collaborator

NobodyXu commented May 3, 2024

@dpaoliello Thanks for the quick response!

Though I'd have to say that adding a new parameter to a public function would break many crates.

I'd suggest to

  • create a new function is_flag_supported_inner which takes a compiler_path
  • use Build::get_base_compiler() directly to retrieve the compiler in is_flag_supported, which might improve performance

@dpaoliello
Copy link
Contributor

Ah, didn't realize that is_flag_supported was exported - I've applied your suggestions (I can also create a PR if you think it's worthwhile to make this change anyway)

@NobodyXu
Copy link
Collaborator

NobodyXu commented May 3, 2024

Yeah I think it's worth a PR regardless of if it fixes the regression.

@AlexanderSchuetz97
Copy link
Author

AlexanderSchuetz97 commented May 3, 2024

I will only be able to access this project on monday. I will perform the perf suggestion then and report my findings.

@dpaoliello
Copy link
Contributor

Done #1050

@AlexanderSchuetz97
Copy link
Author

Hello again, this fix appears to significantly improve performance.
It is still 5-10% slower than using version 1.0.90, however the time to compile is acceptable for me now.
When can I expect v 1.0.97 (Which will presumably contain the fix from dpoliello?)

@NobodyXu
Copy link
Collaborator

NobodyXu commented May 6, 2024

I was waiting for one last nit to be resolved, but I can edit the PR directly and merge it.

Would cut a release soon after merging it.

@NobodyXu
Copy link
Collaborator

NobodyXu commented May 6, 2024

It also looks like we need a performance benchmark suite in cc

@AlexanderSchuetz97
Copy link
Author

It also looks like we need a performance benchmark suite in cc

This would probably be a good idea.

Would cut a release soon after merging it.

Thank you.

Since the PR is merged I will close this issue. You should track the benchmark implementation in issue #1065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants