-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize field ordering by grouping m*2^n-sized fields with equivalently aligned ones #102750
Conversation
r? @wesleywiser (rust-highfive has picked a reviewer for you, use r? to override) |
a8647dd
to
1d58865
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 2657f87590b84095f88c24ee525ec073963f9789 with merge baf56fd405dcb58b0e0470dd22880f87fb4dfe00... |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
2657f87
to
36e2946
Compare
@bors try |
⌛ Trying commit 36e29466a9342424ae2ffeea8a31c0a2f85c94c9 with merge 1db897b92c4dbe666c676afce1afc7278d6de594... |
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
Queued 1db897b92c4dbe666c676afce1afc7278d6de594 with parent 0ca3565, future comparison URL. |
Finished benchmarking commit (1db897b92c4dbe666c676afce1afc7278d6de594): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
36e2946
to
bd4f801
Compare
Perf results lean positive. Added tests. @rustbot ready |
The post-merge perf result was less clearly positive than the earlier one, though overall it looks roughly neutral, albeit with lots of changes in both directions. @rustbot label: +perf-regression-triaged |
Marking this as relnotes, since people are noticing the layout changes in their structs. The relnotes should remind people that the layout of |
Fixes an incorrect assumption that was broken in [Rust PR 102750](rust-lang/rust#102750)
Fixes an incorrect assumption that was broken in [Rust PR 102750](rust-lang/rust#102750)
Pkgsrc changes: * Adjust patches and cargo checksums to new versions, but also one strange "mips" conditional. Upstream changes: Version 1.67.0 (2023-01-26) ========================== Language -------- - [Make `Sized` predicates coinductive, allowing cycles.] (rust-lang/rust#100386) - [`#[must_use]` annotations on `async fn` also affect the `Future::Output`.] (rust-lang/rust#100633) - [Elaborate supertrait obligations when deducing closure signatures.] (rust-lang/rust#101834) - [Invalid literals are no longer an error under `cfg(FALSE)`.] (rust-lang/rust#102944) - [Unreserve braced enum variants in value namespace.] (rust-lang/rust#103578) Compiler -------- - [Enable varargs support for calling conventions other than `C` or `cdecl`.] (rust-lang/rust#97971) - [Add new MIR constant propagation based on dataflow analysis.] (rust-lang/rust#101168) - [Optimize field ordering by grouping m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) - [Stabilize native library modifier `verbatim`.] (rust-lang/rust#104360) Added and removed targets: - [Add a tier 3 target for PowerPC on AIX] (rust-lang/rust#102293), `powerpc64-ibm-aix`. - [Add a tier 3 target for the Sony PlayStation 1] (rust-lang/rust#102689), `mipsel-sony-psx`. - [Add tier 3 `no_std` targets for the QNX Neutrino RTOS] (rust-lang/rust#102701), `aarch64-unknown-nto-qnx710` and `x86_64-pc-nto-qnx710`. - [Remove tier 3 `linuxkernel` targets] (rust-lang/rust#104015) (not used by the actual kernel). Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Merge `crossbeam-channel` into `std::sync::mpsc`.] (rust-lang/rust#93563) - [Fix inconsistent rounding of 0.5 when formatted to 0 decimal places.] (rust-lang/rust#102935) - [Derive `Eq` and `Hash` for `ControlFlow`.] (rust-lang/rust#103084) - [Don't build `compiler_builtins` with `-C panic=abort`.] (rust-lang/rust#103786) Stabilized APIs --------------- - [`{integer}::checked_ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog) - [`{integer}::checked_ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog2) - [`{integer}::checked_ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog10) - [`{integer}::ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog) - [`{integer}::ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog2) - [`{integer}::ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog10) - [`NonZeroU*::ilog2`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog2) - [`NonZeroU*::ilog10`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog10) - [`NonZero*::BITS`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#associatedconstant.BITS) These APIs are now stable in const contexts: - [`char::from_u32`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_u32) - [`char::from_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_digit) - [`char::to_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_digit) - [`core::char::from_u32`] (https://doc.rust-lang.org/stable/core/char/fn.from_u32.html) - [`core::char::from_digit`] (https://doc.rust-lang.org/stable/core/char/fn.from_digit.html) Compatibility Notes ------------------- - [The layout of `repr(Rust)` types now groups m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) This is intended to be an optimization, but it is also known to increase type sizes in a few cases for the placement of enum tags. As a reminder, the layout of `repr(Rust)` types is an implementation detail, subject to change. - [0.5 now rounds to 0 when formatted to 0 decimal places.] (rust-lang/rust#102935) This makes it consistent with the rest of floating point formatting that rounds ties toward even digits. - [Chains of `&&` and `||` will now drop temporaries from their sub-expressions in evaluation order, left-to-right.] (rust-lang/rust#103293) Previously, it was "twisted" such that the _first_ expression dropped its temporaries _last_, after all of the other expressions dropped in order. - [Underscore suffixes on string literals are now a hard error.] (rust-lang/rust#103914) This has been a future-compatibility warning since 1.20.0. - [Stop passing `-export-dynamic` to `wasm-ld`.] (rust-lang/rust#105405) - [`main` is now mangled as `__main_void` on `wasm32-wasi`.] (rust-lang/rust#105468) - [Cargo now emits an error if there are multiple registries in the configuration with the same index URL.] (rust-lang/cargo#10592) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Rewrite LLVM's archive writer in Rust.] (rust-lang/rust#97485)
Pkgsrc changes: * Adjust patches (add & remove) and cargo checksums to new versions. * It's conceivable that the workaround for LLVM based NetBSD works even less in this version (ref. PKGSRC_HAVE_LIBCPP not having a corresponding patch anymore). Upstream changes: Version 1.68.2 (2023-03-28) =========================== - [Update the GitHub RSA host key bundled within Cargo] (rust-lang/cargo#11883). The key was [rotated by GitHub] (https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/) on 2023-03-24 after the old one leaked. - [Mark the old GitHub RSA host key as revoked] (rust-lang/cargo#11889). This will prevent Cargo from accepting the leaked key even when trusted by the system. - [Add support for `@revoked` and a better error message for `@cert-authority` in Cargo's SSH host key verification] (rust-lang/cargo#11635) Version 1.68.1 (2023-03-23) =========================== - [Fix miscompilation in produced Windows MSVC artifacts] (rust-lang/rust#109094) This was introduced by enabling ThinLTO for the distributed rustc which led to miscompilations in the resulting binary. Currently this is believed to be limited to the -Zdylib-lto flag used for rustc compilation, rather than a general bug in ThinLTO, so only rustc artifacts should be affected. - [Fix --enable-local-rust builds] (rust-lang/rust#109111) - [Treat `$prefix-clang` as `clang` in linker detection code] (rust-lang/rust#109156) - [Fix panic in compiler code] (rust-lang/rust#108162) Version 1.68.0 (2023-03-09) =========================== Language -------- - [Stabilize default_alloc_error_handler] (rust-lang/rust#102318) This allows usage of `alloc` on stable without requiring the definition of a handler for allocation failure. Defining custom handlers is still unstable. - [Stabilize `efiapi` calling convention.] (rust-lang/rust#105795) - [Remove implicit promotion for types with drop glue] (rust-lang/rust#105085) Compiler -------- - [Change `bindings_with_variant_name` to deny-by-default] (rust-lang/rust#104154) - [Allow .. to be parsed as let initializer] (rust-lang/rust#105701) - [Add `armv7-sony-vita-newlibeabihf` as a tier 3 target] (rust-lang/rust#105712) - [Always check alignment during compile-time const evaluation] (rust-lang/rust#104616) - [Disable "split dwarf inlining" by default.] (rust-lang/rust#106709) - [Add vendor to Fuchsia's target triple] (rust-lang/rust#106429) - [Enable sanitizers for s390x-linux] (rust-lang/rust#107127) Libraries --------- - [Loosen the bound on the Debug implementation of Weak.] (rust-lang/rust#90291) - [Make `std::task::Context` !Send and !Sync] (rust-lang/rust#95985) - [PhantomData layout guarantees] (rust-lang/rust#104081) - [Don't derive Debug for `OnceWith` & `RepeatWith`] (rust-lang/rust#104163) - [Implement DerefMut for PathBuf] (rust-lang/rust#105018) - [Add O(1) `Vec -> VecDeque` conversion guarantee] (rust-lang/rust#105128) - [Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met] (rust-lang/rust#105851) Stabilized APIs --------------- - [`{core,std}::pin::pin!`] (https://doc.rust-lang.org/stable/std/pin/macro.pin.html) - [`impl From<bool> for {f32,f64}`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#impl-From%3Cbool%3E-for-f32) - [`std::path::MAIN_SEPARATOR_STR`] (https://doc.rust-lang.org/stable/std/path/constant.MAIN_SEPARATOR_STR.html) - [`impl DerefMut for PathBuf`] (https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html#impl-DerefMut-for-PathBuf) These APIs are now stable in const contexts: - [`VecDeque::new`] (https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html#method.new) Cargo ----- - [Stabilize sparse registry support for crates.io] (rust-lang/cargo#11224) - [`cargo build --verbose` tells you more about why it recompiles.] (rust-lang/cargo#11407) - [Show progress of crates.io index update even `net.git-fetch-with-cli` option enabled] (rust-lang/cargo#11579) Misc ---- Compatibility Notes ------------------- - [Add `SEMICOLON_IN_EXPRESSIONS_FROM_MACROS` to future-incompat report] (rust-lang/rust#103418) - [Only specify `--target` by default for `-Zgcc-ld=lld` on wasm] (rust-lang/rust#101792) - [Bump `IMPLIED_BOUNDS_ENTAILMENT` to Deny + ReportNow] (rust-lang/rust#106465) - [`std::task::Context` no longer implements Send and Sync] (rust-lang/rust#95985) nternal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Encode spans relative to the enclosing item] (rust-lang/rust#84762) - [Don't normalize in AstConv] (rust-lang/rust#101947) - [Find the right lower bound region in the scenario of partial order relations] (rust-lang/rust#104765) - [Fix impl block in const expr] (rust-lang/rust#104889) - [Check ADT fields for copy implementations considering regions] (rust-lang/rust#105102) - [rustdoc: simplify JS search routine by not messing with lev distance] (rust-lang/rust#105796) - [Enable ThinLTO for rustc on `x86_64-pc-windows-msvc`] (rust-lang/rust#103591) - [Enable ThinLTO for rustc on `x86_64-apple-darwin`] (rust-lang/rust#103647) Version 1.67.0 (2023-01-26) ========================== Language -------- - [Make `Sized` predicates coinductive, allowing cycles.] (rust-lang/rust#100386) - [`#[must_use]` annotations on `async fn` also affect the `Future::Output`.] (rust-lang/rust#100633) - [Elaborate supertrait obligations when deducing closure signatures.] (rust-lang/rust#101834) - [Invalid literals are no longer an error under `cfg(FALSE)`.] (rust-lang/rust#102944) - [Unreserve braced enum variants in value namespace.] (rust-lang/rust#103578) Compiler -------- - [Enable varargs support for calling conventions other than `C` or `cdecl`.] (rust-lang/rust#97971) - [Add new MIR constant propagation based on dataflow analysis.] (rust-lang/rust#101168) - [Optimize field ordering by grouping m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) - [Stabilize native library modifier `verbatim`.] (rust-lang/rust#104360) Added and removed targets: - [Add a tier 3 target for PowerPC on AIX] (rust-lang/rust#102293), `powerpc64-ibm-aix`. - [Add a tier 3 target for the Sony PlayStation 1] (rust-lang/rust#102689), `mipsel-sony-psx`. - [Add tier 3 `no_std` targets for the QNX Neutrino RTOS] (rust-lang/rust#102701), `aarch64-unknown-nto-qnx710` and `x86_64-pc-nto-qnx710`. - [Remove tier 3 `linuxkernel` targets] (rust-lang/rust#104015) (not used by the actual kernel). Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Merge `crossbeam-channel` into `std::sync::mpsc`.] (rust-lang/rust#93563) - [Fix inconsistent rounding of 0.5 when formatted to 0 decimal places.] (rust-lang/rust#102935) - [Derive `Eq` and `Hash` for `ControlFlow`.] (rust-lang/rust#103084) - [Don't build `compiler_builtins` with `-C panic=abort`.] (rust-lang/rust#103786) Stabilized APIs --------------- - [`{integer}::checked_ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog) - [`{integer}::checked_ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog2) - [`{integer}::checked_ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.checked_ilog10) - [`{integer}::ilog`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog) - [`{integer}::ilog2`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog2) - [`{integer}::ilog10`] (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.ilog10) - [`NonZeroU*::ilog2`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog2) - [`NonZeroU*::ilog10`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#method.ilog10) - [`NonZero*::BITS`] (https://doc.rust-lang.org/stable/std/num/struct.NonZeroU32.html#associatedconstant.BITS) These APIs are now stable in const contexts: - [`char::from_u32`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_u32) - [`char::from_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.from_digit) - [`char::to_digit`] (https://doc.rust-lang.org/stable/std/primitive.char.html#method.to_digit) - [`core::char::from_u32`] (https://doc.rust-lang.org/stable/core/char/fn.from_u32.html) - [`core::char::from_digit`] (https://doc.rust-lang.org/stable/core/char/fn.from_digit.html) Compatibility Notes ------------------- - [The layout of `repr(Rust)` types now groups m\*2^n-sized fields with equivalently aligned ones.] (rust-lang/rust#102750) This is intended to be an optimization, but it is also known to increase type sizes in a few cases for the placement of enum tags. As a reminder, the layout of `repr(Rust)` types is an implementation detail, subject to change. - [0.5 now rounds to 0 when formatted to 0 decimal places.] (rust-lang/rust#102935) This makes it consistent with the rest of floating point formatting that rounds ties toward even digits. - [Chains of `&&` and `||` will now drop temporaries from their sub-expressions in evaluation order, left-to-right.] (rust-lang/rust#103293) Previously, it was "twisted" such that the _first_ expression dropped its temporaries _last_, after all of the other expressions dropped in order. - [Underscore suffixes on string literals are now a hard error.] (rust-lang/rust#103914) This has been a future-compatibility warning since 1.20.0. - [Stop passing `-export-dynamic` to `wasm-ld`.] (rust-lang/rust#105405) - [`main` is now mangled as `__main_void` on `wasm32-wasi`.] (rust-lang/rust#105468) - [Cargo now emits an error if there are multiple registries in the configuration with the same index URL.] (rust-lang/cargo#10592) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Rewrite LLVM's archive writer in Rust.] (rust-lang/rust#97485)
After rust-lang/rust#102750 , Rust reorders fields in repr(Rust) structures based on size/alignment. Applying such reordering to the Dsp struct substantially hurts performance. Set repr(C) on the Dsp struct to recover that performance. See https://internals.rust-lang.org/t/unexpected-3-x-performance-regression-starting-with-rust-version-1-67/18724 for the test case and investigations that led to this.
After rust-lang/rust#102750 , Rust reorders fields in repr(Rust) structures based on size/alignment. Applying such reordering to the Dsp struct substantially hurts performance. Set repr(C) on the Dsp struct to recover that performance. See https://internals.rust-lang.org/t/unexpected-3-x-performance-regression-starting-with-rust-version-1-67/18724 for the test case and investigations that led to this.
Newer toolchains cause stack overflow errors in some tests. That may be related to a memory alignment change in Rust v1.67 (rust-lang/rust#102750)
Newer toolchains cause stack overflow errors in some tests. That may be related to a memory alignment change in Rust v1.67 (rust-lang/rust#102750)
After rust-lang/rust#102750 , Rust reorders fields in repr(Rust) structures based on size/alignment. Applying such reordering to the Dsp struct substantially hurts performance. Set repr(C) on the Dsp struct to recover that performance. See https://internals.rust-lang.org/t/unexpected-3-x-performance-regression-starting-with-rust-version-1-67/18724 for the test case and investigations that led to this.
prints
I.e. the
u8
in the middle causes the array to sit at an odd offset, which might prevent optimizations, especially on architectures where unaligned loads are costly.Note that this will make field ordering niche-dependent, i.e. a
Bar<T>
withT=char
andT=u32
may result in different field order, this may break some code that makes invalid assumptions aboutrepr(Rust)
types.