Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process. #112157

Merged
merged 26 commits into from
Jul 15, 2023

Conversation

erikdesjardins
Copy link
Contributor

@erikdesjardins erikdesjardins commented Jun 1, 2023

Same as #111551, which I accidentally closed :/


This resurrects PR #103830, which has sat idle for a while.

Beyond #103830, this also:

  • fixes byval alignment for types containing vectors on Darwin (see tests/codegen/align-byval-vector.rs)
  • fixes byval alignment for overaligned types on x86 Windows (see tests/codegen/align-byval.rs)
  • fixes ABI for types with 128bit requested alignment on ARM64 Linux (see tests/codegen/aarch64-struct-align-128.rs)

r? @nikic


@pcwalton's original PR description is reproduced below:

Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the i686-pc-windows-msvc target. Unfortunately, the memcpy optimizations I
recently added to LLVM 16 depend on this to forward memcpys. This commit
attempts to fix the problems with byval parameters on that target and now
correctly adds the align attribute.

The problem is summarized in this comment by @eddyb. Briefly, 32-bit x86 has
special alignment rules for byval parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang TargetInfo.cpp and tried to replicate
it here. The relevant methods in that file are
X86_32ABIInfo::getIndirectResult() and
X86_32ABIInfo::getTypeStackAlignInBytes(). The align parameter attribute
for byval parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in on_stack when special
handling is really only needed for 32-bit x86.

As a side effect, this should fix #80127, because it will make the align
parameter attribute for byval parameters match the platform ABI on LLVM
x86-64.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 1, 2023
@nikic
Copy link
Contributor

nikic commented Jun 10, 2023

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jun 10, 2023

📌 Commit ebb4f7ebf3e04be6992c57c48f31b850201e2dc7 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 10, 2023
@nikic nikic added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jun 10, 2023
@nikic
Copy link
Contributor

nikic commented Jun 10, 2023

Nominating this as an FYI: This introduces a semantic difference between a struct with explicit alignment #[align(8)] and a struct that just happens to be 8 byte aligned. This appears to be an ABI requirement for 32-bit MSVC, so I don't see how we could avoid it, but seems like something that could use wider attention.

@nikic
Copy link
Contributor

nikic commented Jun 10, 2023

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 10, 2023
@erikdesjardins
Copy link
Contributor Author

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 11, 2023
@rust-log-analyzer

This comment has been minimized.

tests/codegen/align-byval.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Jun 11, 2023

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

tests/codegen/align-byval.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@nikic
Copy link
Contributor

nikic commented Jun 12, 2023

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jun 12, 2023

📌 Commit 12112cf8d269d903e8bdafda730b9983cb4d489e has been approved by nikic

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 12, 2023
@rust-log-analyzer
Copy link
Collaborator

The job armhf-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
curl: (22) The requested URL returned error: 404 

error: failed to download llvm from ci

    help: old builds get deleted after a certain time
    help: if trying to compile an old commit of rustc, disable `download-ci-llvm` in config.toml:
    [llvm]
    download-ci-llvm = false
    
Build completed unsuccessfully in 0:00:00

@nikic
Copy link
Contributor

nikic commented Jul 15, 2023

@bors treeclosed=100 Try build got merged into master, everything is broken.

@nikic
Copy link
Contributor

nikic commented Jul 15, 2023

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 15, 2023
@jyn514
Copy link
Member

jyn514 commented Jul 15, 2023

@bors treeclosed-

master branch was force pushed to ad96323

@bors
Copy link
Contributor

bors commented Jul 15, 2023

⌛ Testing commit 2daacf5 with merge 7a17f57...

@bors
Copy link
Contributor

bors commented Jul 15, 2023

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 7a17f57 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 15, 2023
@bors bors merged commit 7a17f57 into rust-lang:master Jul 15, 2023
@rustbot rustbot added this to the 1.73.0 milestone Jul 15, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7a17f57): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.7% [1.5%, 2.0%] 6
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-2.9%, -2.8%] 6
All ❌✅ (primary) 1.7% [1.5%, 2.0%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.1% [0.7%, 3.2%] 3
Improvements ✅
(primary)
-1.5% [-1.5%, -1.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-1.5%, -1.5%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) - - 0

Bootstrap: 657.725s -> 657.257s (-0.07%)

@rustbot rustbot added the perf-regression Performance regression. label Jul 15, 2023
@erikdesjardins erikdesjardins deleted the align branch July 17, 2023 06:49
@pnkfelix
Copy link
Member

pnkfelix commented Jul 19, 2023

Visiting for perf-triage.

The six primary regressions were all to variants of diesel (all of check/debug/opt) for variious full and incr-full scenarios. It isn't noise, there seems to be a clear cliff that starts with this PR when looking at the graph starting from 2023-0702.

I'm not marking as triaged yet, but I was tempted to do so, because this PR is a prerequiste for unlocking various memcpy optimizations added by pcwalton to LLVM 16.

@erikdesjardins
Copy link
Contributor Author

cachegrind diff of diesel-1.4.8-check-full is all in the trait system:

--------------------------------------------------------------------------------
Ir          
--------------------------------------------------------------------------------
395,904,127  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
  315,004,916  ???:<rustc_infer::traits::Obligation<rustc_middle::ty::Predicate> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::fold_with::<rustc_infer::infer::resolve::OpportunisticVarResolver>
 -182,350,919  ???:<rustc_middle::ty::generic_args::ArgFolder as rustc_type_ir::fold::TypeFolder<rustc_middle::ty::context::TyCtxt>>::fold_ty
 -103,793,312  ???:<alloc::vec::Vec<rustc_middle::ty::Clause> as alloc::vec::spec_extend::SpecExtend<rustc_middle::ty::Clause, core::iter::adapters::map::Map<core::slice::iter::Iter<(rustc_middle::ty::Clause, rustc_span::span_encoding::Span)>, <rustc_middle::ty::generics::GenericPredicates>::instantiate_into::{closure
   79,589,914  ???:<rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   74,310,808  ???:<&rustc_middle::ty::list::List<rustc_middle::ty::generic_args::GenericArg> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   71,613,406  ???:<rustc_middle::ty::generics::GenericPredicates>::instantiate_into
   51,270,977  ???:<alloc::vec::Vec<rustc_middle::ty::Clause> as alloc::vec::spec_extend::SpecExtend<rustc_middle::ty::Clause, core::iter::adapters::filter::Filter<core::iter::adapters::map::Map<core::iter::adapters::enumerate::Enumerate<core::slice::iter::Iter<(rustc_middle::ty::Clause, rustc_span::span_encoding::Span)>>, <rustc_infer::traits::util::Elaborator<rustc_middle::ty::Clause>>::elaborate::{closure
   41,690,794  ???:<(rustc_middle::ty::Predicate, rustc_middle::ty::ParamEnv) as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::ShallowResolver>
  -34,018,300  ???:<rustc_middle::ty::Clause>::subst_supertrait
  -29,206,801  ???:<rustc_infer::traits::util::Elaborator<rustc_middle::ty::Clause> as core::iter::traits::iterator::Iterator>::next
   27,141,479  ???:rustc_middle::ty::util::fold_list::<rustc_infer::infer::canonical::canonicalizer::Canonicalizer, rustc_middle::ty::Clause, <&rustc_middle::ty::list::List<rustc_middle::ty::Clause> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with<rustc_infer::infer::canonical::canonicalizer::Canonicalizer>::{closure
   23,138,213  ???:<rustc_trait_selection::traits::select::SelectionContext>::candidate_from_obligation_no_cache
   18,700,656  ???:<rustc_middle::ty::generic_args::GenericArg as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_middle::ty::generic_args::ArgFolder>
   14,608,136  ???:<rustc_infer::infer::InferCtxt>::commit_if_ok::<rustc_trait_selection::traits::project::ProjectAndUnifyResult, rustc_infer::traits::project::MismatchedProjectionTypes, rustc_trait_selection::traits::project::poly_project_and_unify_type::{closure
  -14,571,455  ???:<rustc_hir_typeck::fn_ctxt::FnCtxt>::check_argument_types
  -14,436,037  ???:rustc_trait_selection::traits::project::poly_project_and_unify_type
   13,002,031  ???:<rustc_hir_typeck::fn_ctxt::FnCtxt>::check_call
...

I suspect this change allowed a memcpy to be optimized out somewhere, changing the size of a function enough to perturb inlining, and this happened to be a bad move.

(I was actually expecting this to be caused by the cost of adding additional attributes, but that seems to be insignificant here.)

bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Jul 22, 2023
Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process.

Same as rust-lang#111551, which I [accidentally closed](rust-lang#111551 (comment)) :/

---

This resurrects PR rust-lang#103830, which has sat idle for a while.

Beyond rust-lang#103830, this also:
- fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`)
- fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`)
- fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`)

r? `@nikic`

---

`@pcwalton's` original PR description is reproduced below:

Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.

The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.

As a side effect, this should fix rust-lang#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.

[this comment]: rust-lang#80822 (comment)
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2024
Stop using LLVM struct types for alloca, byval, sret, and many GEPs

This is an extension of rust-lang#98615, extending the removal from field offsets to most places that it's feasible right now. (It might make sense to split this PR up, but I want to test perf with everything.)

For `alloca`, `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's layout algorithm. Particularly for `alloca`, it is likely that a future LLVM will change to a representation where you only specify the size.

For GEPs, upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which is the same thing we do here.

\*: Since we always explicitly specify the alignment. For `byval`, this wasn't the case until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue here.

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Stop using LLVM struct types for byval/sret

For `byval`, and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would also matter if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 10, 2024
Stop using LLVM struct types for byval/sret

For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 10, 2024
Stop using LLVM struct types for byval/sret

For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 11, 2024
Stop using LLVM struct types for byval/sret

For `byval` and `sret`, the type has no semantic meaning, only the size matters\*†. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout.

\*: The alignment would matter, if we didn't explicitly specify it. From what I can tell, we always specified the alignment for `sret`; for `byval`, we didn't until rust-lang#112157.

†: For `byval`, the hidden copy may be impacted by padding in the LLVM struct type, i.e. padding bytes may not be copied. (I'm not sure if this is done today, but I think it would be legal.) But we manually pad our LLVM struct types specifically to avoid there ever being LLVM-visible padding, so that shouldn't be an issue.

Split out from rust-lang#121577.

r? `@nikic`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustc has wrong signature for C function with 16-byte aligned stack argument in x86_64 Linux
10 participants