Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

erikdesjardins · 2023-06-01T02:39:03Z

Same as #111551, which I accidentally closed :/

This resurrects PR #103830, which has sat idle for a while.

Beyond #103830, this also:

fixes byval alignment for types containing vectors on Darwin (see tests/codegen/align-byval-vector.rs)
fixes byval alignment for overaligned types on x86 Windows (see tests/codegen/align-byval.rs)
fixes ABI for types with 128bit requested alignment on ARM64 Linux (see tests/codegen/aarch64-struct-align-128.rs)

@pcwalton's original PR description is reproduced below:

Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the i686-pc-windows-msvc target. Unfortunately, the memcpy optimizations I
recently added to LLVM 16 depend on this to forward memcpys. This commit
attempts to fix the problems with byval parameters on that target and now
correctly adds the align attribute.

The problem is summarized in this comment by @eddyb. Briefly, 32-bit x86 has
special alignment rules for byval parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang TargetInfo.cpp and tried to replicate
it here. The relevant methods in that file are
X86_32ABIInfo::getIndirectResult() and
X86_32ABIInfo::getTypeStackAlignInBytes(). The align parameter attribute
for byval parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in on_stack when special
handling is really only needed for 32-bit x86.

As a side effect, this should fix #80127, because it will make the align
parameter attribute for byval parameters match the platform ABI on LLVM
x86-64.

nikic · 2023-06-10T15:31:07Z

@bors r+ rollup=never

bors · 2023-06-10T15:31:09Z

📌 Commit ebb4f7ebf3e04be6992c57c48f31b850201e2dc7 has been approved by nikic

It is now in the queue for this repository.

nikic · 2023-06-10T15:37:59Z

Nominating this as an FYI: This introduces a semantic difference between a struct with explicit alignment #[align(8)] and a struct that just happens to be 8 byte aligned. This appears to be an ABI requirement for 32-bit MSVC, so I don't see how we could avoid it, but seems like something that could use wider attention.

nikic · 2023-06-10T17:09:43Z

@bors r-

compiler/rustc_target/src/abi/call/x86.rs

erikdesjardins · 2023-06-11T03:12:54Z

@rustbot review

tests/codegen/align-byval.rs

rustbot · 2023-06-11T18:00:56Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

tests/codegen/align-byval.rs

nikic · 2023-06-12T12:57:05Z

@bors r+ rollup=never

bors · 2023-06-12T12:57:07Z

📌 Commit 12112cf8d269d903e8bdafda730b9983cb4d489e has been approved by nikic

It is now in the queue for this repository.

rust-log-analyzer · 2023-07-15T07:53:43Z

The job armhf-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

curl: (22) The requested URL returned error: 404 

error: failed to download llvm from ci

    help: old builds get deleted after a certain time
    help: if trying to compile an old commit of rustc, disable `download-ci-llvm` in config.toml:
    [llvm]
    download-ci-llvm = false
    
Build completed unsuccessfully in 0:00:00

nikic · 2023-07-15T07:57:03Z

@bors treeclosed=100 Try build got merged into master, everything is broken.

nikic · 2023-07-15T07:57:55Z

@bors retry

jyn514 · 2023-07-15T13:51:59Z

@bors treeclosed-

master branch was force pushed to ad96323

bors · 2023-07-15T15:39:57Z

⌛ Testing commit 2daacf5 with merge 7a17f57...

bors · 2023-07-15T17:29:12Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 7a17f57 to master...

rust-timer · 2023-07-15T18:45:08Z

Finished benchmarking commit (7a17f57): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[1.5%, 2.0%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-2.9%, -2.8%]	6
All ❌✅ (primary)	1.7%	[1.5%, 2.0%]	6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.1%	[0.7%, 3.2%]	3
Improvements ✅ (primary)	-1.5%	[-1.5%, -1.5%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-1.5%, -1.5%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-	-	0

Bootstrap: 657.725s -> 657.257s (-0.07%)

pnkfelix · 2023-07-19T02:09:48Z

Visiting for perf-triage.

The six primary regressions were all to variants of diesel (all of check/debug/opt) for variious full and incr-full scenarios. It isn't noise, there seems to be a clear cliff that starts with this PR when looking at the graph starting from 2023-0702.

I'm not marking as triaged yet, but I was tempted to do so, because this PR is a prerequiste for unlocking various memcpy optimizations added by pcwalton to LLVM 16.

erikdesjardins · 2023-07-21T04:32:31Z

cachegrind diff of diesel-1.4.8-check-full is all in the trait system:

--------------------------------------------------------------------------------
Ir          
--------------------------------------------------------------------------------
395,904,127  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
  315,004,916  ???:<rustc_infer::traits::Obligation<rustc_middle::ty::Predicate> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::fold_with::<rustc_infer::infer::resolve::OpportunisticVarResolver>
 -182,350,919  ???:<rustc_middle::ty::generic_args::ArgFolder as rustc_type_ir::fold::TypeFolder<rustc_middle::ty::context::TyCtxt>>::fold_ty
 -103,793,312  ???:<alloc::vec::Vec<rustc_middle::ty::Clause> as alloc::vec::spec_extend::SpecExtend<rustc_middle::ty::Clause, core::iter::adapters::map::Map<core::slice::iter::Iter<(rustc_middle::ty::Clause, rustc_span::span_encoding::Span)>, <rustc_middle::ty::generics::GenericPredicates>::instantiate_into::{closure
   79,589,914  ???:<rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   74,310,808  ???:<&rustc_middle::ty::list::List<rustc_middle::ty::generic_args::GenericArg> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   71,613,406  ???:<rustc_middle::ty::generics::GenericPredicates>::instantiate_into
   51,270,977  ???:<alloc::vec::Vec<rustc_middle::ty::Clause> as alloc::vec::spec_extend::SpecExtend<rustc_middle::ty::Clause, core::iter::adapters::filter::Filter<core::iter::adapters::map::Map<core::iter::adapters::enumerate::Enumerate<core::slice::iter::Iter<(rustc_middle::ty::Clause, rustc_span::span_encoding::Span)>>, <rustc_infer::traits::util::Elaborator<rustc_middle::ty::Clause>>::elaborate::{closure
   41,690,794  ???:<(rustc_middle::ty::Predicate, rustc_middle::ty::ParamEnv) as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::ShallowResolver>
  -34,018,300  ???:<rustc_middle::ty::Clause>::subst_supertrait
  -29,206,801  ???:<rustc_infer::traits::util::Elaborator<rustc_middle::ty::Clause> as core::iter::traits::iterator::Iterator>::next
   27,141,479  ???:rustc_middle::ty::util::fold_list::<rustc_infer::infer::canonical::canonicalizer::Canonicalizer, rustc_middle::ty::Clause, <&rustc_middle::ty::list::List<rustc_middle::ty::Clause> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with<rustc_infer::infer::canonical::canonicalizer::Canonicalizer>::{closure
   23,138,213  ???:<rustc_trait_selection::traits::select::SelectionContext>::candidate_from_obligation_no_cache
   18,700,656  ???:<rustc_middle::ty::generic_args::GenericArg as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   14,608,136  ???:<rustc_infer::infer::InferCtxt>::commit_if_ok::<rustc_trait_selection::traits::project::ProjectAndUnifyResult, rustc_infer::traits::project::MismatchedProjectionTypes, rustc_trait_selection::traits::project::poly_project_and_unify_type::{closure
  -14,571,455  ???:<rustc_hir_typeck::fn_ctxt::FnCtxt>::check_argument_types
  -14,436,037  ???:rustc_trait_selection::traits::project::poly_project_and_unify_type
   13,002,031  ???:<rustc_hir_typeck::fn_ctxt::FnCtxt>::check_call
...

I suspect this change allowed a memcpy to be optimized out somewhere, changing the size of a function enough to perturb inlining, and this happened to be a bad move.

(I was actually expecting this to be caused by the cost of adding additional attributes, but that seems to be insignificant here.)

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. Same as rust-lang#111551, which I [accidentally closed](rust-lang#111551 (comment)) :/ --- This resurrects PR rust-lang#103830, which has sat idle for a while. Beyond rust-lang#103830, this also: - fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`) - fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`) - fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`) r? `@nikic` --- `@pcwalton's` original PR description is reproduced below: Commit 88e4d2c from five years ago removed support for alignment on indirectly-passed arguments because of problems with the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I recently added to LLVM 16 depend on this to forward `memcpy`s. This commit attempts to fix the problems with `byval` parameters on that target and now correctly adds the `align` attribute. The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has special alignment rules for `byval` parameters: for the most part, their alignment is forced to 4. This is not well-documented anywhere but in the Clang source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate it here. The relevant methods in that file are `X86_32ABIInfo::getIndirectResult()` and `X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute for `byval` parameters in LLVM must match the platform ABI, or miscompilations will occur. Note that this doesn't use the approach suggested by eddyb, because I felt it was overkill to store the alignment in `on_stack` when special handling is really only needed for 32-bit x86. As a side effect, this should fix rust-lang#80127, because it will make the `align` parameter attribute for `byval` parameters match the platform ABI on LLVM x86-64. [this comment]: rust-lang#80822 (comment)

Stop using LLVM struct types for alloca, byval, sret, and many GEPs This is an extension of rust-lang#98615, extending the removal from field offsets to most places that it's feasible right now. (It might make sense to split this PR up, but I want to test perf with everything.) For `alloca`, `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's layout algorithm. Particularly for `alloca`, it is likely that a future LLVM will change to a representation where you only specify the size. For GEPs, upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which is the same thing we do here. \*: Since we always explicitly specify the alignment. For `byval`, this wasn't the case until rust-lang#112157. †: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue here. r? `@ghost`

Stop using LLVM struct types for byval/sret For `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. \*: The alignment would also matter if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157. †: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue. Split out from rust-lang#121577. r? `@nikic`

Stop using LLVM struct types for byval/sret For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. \*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157. †: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue. Split out from rust-lang#121577. r? `@nikic`

rustbot assigned nikic Jun 1, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 1, 2023

jyn514 mentioned this pull request Jun 10, 2023

MSVC on x86-32 Windows fails to align variables to their required alignment #112480

Open

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 10, 2023

nikic added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jun 10, 2023

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 10, 2023

nikic reviewed Jun 10, 2023

View reviewed changes

compiler/rustc_target/src/abi/call/x86.rs Show resolved Hide resolved

erikdesjardins force-pushed the align branch 2 times, most recently from f38be1c to f6a4828 Compare June 11, 2023 03:12

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 11, 2023

This comment has been minimized.

Sign in to view

erikdesjardins force-pushed the align branch from f6a4828 to a160475 Compare June 11, 2023 04:11

nikic reviewed Jun 11, 2023

View reviewed changes

tests/codegen/align-byval.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

nikic reviewed Jun 11, 2023

View reviewed changes

tests/codegen/align-byval.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

erikdesjardins force-pushed the align branch from dd11122 to 12112cf Compare June 11, 2023 21:26

bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 12, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 15, 2023

This was referenced Jul 15, 2023

Remove unneeded handling for ExternalLocation::Unknown in rustdoc render context #113697

Merged

rustdoc: use src consistently over source in CSS/JS #113709

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 15, 2023

bors merged commit 7a17f57 into rust-lang:master Jul 15, 2023

rustbot added this to the 1.73.0 milestone Jul 15, 2023

rustbot added the perf-regression Performance regression. label Jul 15, 2023

erikdesjardins deleted the align branch July 17, 2023 06:49

khei4 mentioned this pull request Jul 22, 2023

Missed optimization opportunity when trivially moving tuple of slices #107436

Closed

matthiaskrgr mentioned this pull request Feb 22, 2024

LLVM ERROR Attribute 'align' exceed the max size 2^14 #121444

Closed

erikdesjardins mentioned this pull request Feb 25, 2024

Stop using LLVM struct types for alloca, byval, sret, and many GEPs #121577

Closed

erikdesjardins mentioned this pull request Mar 6, 2024

Stop using LLVM struct types for byval/sret #122050

Merged

erikdesjardins mentioned this pull request Mar 8, 2024

byval argument of C ABI function passed to Rust ABI function has insufficient alignment #122211

Closed

erikdesjardins mentioned this pull request Mar 13, 2024

Bump windows-bindgen to 0.55.0 #122390

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

erikdesjardins commented Jun 1, 2023 •

edited

Loading

nikic commented Jun 10, 2023

bors commented Jun 10, 2023

nikic commented Jun 10, 2023

nikic commented Jun 10, 2023

erikdesjardins commented Jun 11, 2023

This comment has been minimized.

This comment has been minimized.

rustbot commented Jun 11, 2023

This comment has been minimized.

nikic commented Jun 12, 2023

bors commented Jun 12, 2023

rust-log-analyzer commented Jul 15, 2023

nikic commented Jul 15, 2023

nikic commented Jul 15, 2023

jyn514 commented Jul 15, 2023

bors commented Jul 15, 2023

bors commented Jul 15, 2023

rust-timer commented Jul 15, 2023

pnkfelix commented Jul 19, 2023 •

edited

Loading

erikdesjardins commented Jul 21, 2023

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

Conversation

erikdesjardins commented Jun 1, 2023 • edited Loading

nikic commented Jun 10, 2023

bors commented Jun 10, 2023

nikic commented Jun 10, 2023

nikic commented Jun 10, 2023

erikdesjardins commented Jun 11, 2023

This comment has been minimized.

This comment has been minimized.

rustbot commented Jun 11, 2023

This comment has been minimized.

nikic commented Jun 12, 2023

bors commented Jun 12, 2023

rust-log-analyzer commented Jul 15, 2023

nikic commented Jul 15, 2023

nikic commented Jul 15, 2023

jyn514 commented Jul 15, 2023

bors commented Jul 15, 2023

bors commented Jul 15, 2023

rust-timer commented Jul 15, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

pnkfelix commented Jul 19, 2023 • edited Loading

erikdesjardins commented Jul 21, 2023

erikdesjardins commented Jun 1, 2023 •

edited

Loading

pnkfelix commented Jul 19, 2023 •

edited

Loading