Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc 1.34.0-nightly (da6ab956e 2019-01-27) and later make encoding_rs with packed_simd go from one .s to 31 .rcgu.s files #58023

Closed
hsivonen opened this issue Jan 31, 2019 · 17 comments

Comments

@hsivonen
Copy link
Member

Steps to reproduce

  1. Clone https://github.com/hsivonen/encoding_rs
  2. cd encoding_rs
  3. git checkout simd
  4. rustup default rustup default 1.32.0
  5. rustup target add armv7-unknown-linux-gnueabihf
  6. RUSTC_BOOTSTRAP=1 RUSTFLAGS='-C target_feature=+neon,+thumb-mode,+thumb2' cargo rustc --target armv7-unknown-linux-gnueabihf --features simd-accel --release -- --emit asm
  7. find target | grep -c '\.s$'
  8. rm -rf target
  9. git checkout packed_simd
  10. rustup default nightly
  11. rustup target add thumbv7neon-unknown-linux-gnueabihf
  12. cargo rustc --target thumbv7neon-unknown-linux-gnueabihf --features simd-accel --release -- --emit asm
  13. find target | grep -c '\.s$'

Actual results

In the simd + Rust 1.32 case, there is one .s file. In the packed_simd + Rust 1.34 case, encoding_rs is split across 31 .s files. These are all .rcgu.s files. Examining these files suggests lesser inlining within encoding_rs, although code from packed_simd and core::arch appears to have gotten inlined.

Expected result

Expected one .s file with the same level of inlining in the packed_simd + Rust 1.34 case as with the simd + Rust 1.32 case.

Additional info

When building an actual binary from a different top-level crate (encoding_bench), the packed_simd + Rust 1.34 case regresses performance relative to the simd + Rust 1.32 on Exynos 5.

@hsivonen
Copy link
Member Author

RUSTFLAGS='-C codegen-units=1' cargo rustc --target thumbv7neon-unknown-linux-gnueabihf --features simd-accel --release -- --emit asm does not change things.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 31, 2019

@hsivonen do you have a cargo/.config or custom profiles when building the simd crate that might be fixing the number of codegen-units or whether incremental compilation is enabled/disabled when using the simd crate ?

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

do you have a cargo/.config or custom profiles

I don't have a cargo directory under encoding_rs. I don't have .config or config under ~/.cargo. I'm not aware of having custom profiles.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

In the case of x86_64-unknown-linux-gnu, the same target exists on both 1.32 and 1.34, so I was able to try packed_simd with both. This is not a simd vs. packed_simd issue: packed_simd with Rust 1.32 produces one .s targeting x86_64 and 31 .rcgu.s files with 1.34.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

One .s with 1.33.0-beta.4. Must be a very recent change.

@hsivonen hsivonen changed the title Switch from simd with Rust 1.32 to packed_simd with Rust 1.34 makes encoding_rs go from one .s to 31 .rcgu.s files rustc 1.34.0-nightly (da6ab956e 2019-01-27) and later make encoding_rs with packed_simd go from one .s to 31 .rcgu.s files Feb 1, 2019
@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

First nightly with 31 .s files: rustc 1.34.0-nightly (da6ab95 2019-01-27)
Last nightly with one .s file: rustc 1.33.0-nightly (20c2cba 2019-01-26)

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

The most suspicious commit in that range seems to be the new cargo version. Maybe the compiler didn't change but the way cargo invokes it did.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

Well, in that range cargo turned on incremental compilation by default.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

Link to the PR for turning on incremental compilation.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

Indeed, incremental = false under [profile.release] in Cargo.toml returns the behavior to a single .s.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

Adding incremental = false under [profile.bench] in the Cargo.toml did not fix the regression relative to the simd crate on the Thumb2 on ARMv7 CPU case. How should I verify that incremental = false is taking effect in a cargo bench scenario?

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

rustc 1.34.0-nightly (da6ab95 2019-01-27) vs. rustc 1.33.0-nightly (20c2cba 2019-01-26) does not show a benchmarking difference on x86_64 (tested on a Haswell desktop i7).

@hsivonen
Copy link
Member Author

hsivonen commented Feb 1, 2019

Since in the incremental = true world --emit asm no longer shows the true output (i.e. ThinLTO) isn't done, I guess I should use an external disassembler to examine if post-ThinLTO inlining is still bad in the Thumb2 case. (I don't know at what stage of compliation Thumb trampolines are generated in the ThinLTO case, so I don't know if Thumb trampolines could interfere with ThinLTO inlining in a manner that's irrelevant to e.g. x86_64.)

@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 1, 2019 via email

@hsivonen
Copy link
Member Author

hsivonen commented Feb 5, 2019

You need to set incremental = false for the bench profile.

I tried again with

[profile.bench]
incremental = false

in both the Cargo.toml for encoding_bench and the Cargo.toml for encoding_rs. For the previous benchmark run, I put it in the Cargo.toml for encoding_bench only. Having it also in the Cargo.toml of encoding_rs made no difference, so most likely the perf regression in comparison to the simd crate has nothing to do with incremental compilation.

I intend to disassemble the bench binaries next.

@hsivonen
Copy link
Member Author

hsivonen commented Feb 5, 2019

It seems that incremental compilation isn't the cause of the regression. The instructions generated for horizontal reductions are.

@hsivonen hsivonen closed this as completed Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants