Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return f32 and f64 in XMM0 instead of FP0 on i686 #115919

Closed
wants to merge 2 commits into from

Conversation

GuentherVIII
Copy link

@GuentherVIII GuentherVIII commented Sep 17, 2023

i686 already uses SSE to do calculations with f32 and f64, but the C calling convention uses the x87 stack to return values. The Rust calling convention does not need to do this, and LLVM makes it easy to use XMM0 instead, which saves move instructions and fixes problems with NaN values.

See #115567, #91399 (comment) and this LLVM comment.

@rustbot
Copy link
Collaborator

rustbot commented Sep 17, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @thomcc (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 17, 2023
Comment on lines 329 to 331
// Ignore test on x87 floating point, these platforms do not guarantee NaN
// payloads are preserved and flush denormals to zero, failing the tests.
#[cfg(not(target_arch = "x86"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests still should be ignored on pre-SSE x86 targets. We have a bunch of i586 tier2 targets and others might be running tests for them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Running the i586 tests also required changes to rustc_ty_utils which I'm not happy with. I could not figure out how to properly detect SSE2 support - target.features is empty on i586-unknown-linux-gnu at that point.

@GuentherVIII
Copy link
Author

@rustbot label +A-abi +A-floating-point +O-x86 -T-libs

@rustbot rustbot added A-abi Area: Concerning the application binary interface (ABI). A-floating-point Area: Floating point numbers and arithmetic O-x86 and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 27, 2023
@chorman0773
Copy link
Contributor

chorman0773 commented Oct 3, 2023

Can I ask that we not make floating-point ABI even more complicated on x86?
Rustc already has ABI issues with floats and -x87, adding +sse to the list of features replied upon for abi is just going to make that problem worse.

Not to mention my headache trying to catalog all of the different x86 float ABIs.

@chorman0773
Copy link
Contributor

@rust-lang/opsem and probably @rust-lang/lang definitely needs to look at this for the reason in #116344.

When compiling for i586, local #[target_feature] attributes can likely affect whether an f32 or f64 return value is passed in an SSE or x87 register according to the code from this PR from what I can tell. The setting for the callee is not respected for the caller, at least in the existing cases affecting by this. This further constrains what can be promised in #115476.

@workingjubilee
Copy link
Member

Can I ask that we not make floating-point ABI even more complicated on x86?
Rustc already has ABI issues with floats and -x87, adding +sse to the list of features replied upon for abi is just going to make that problem worse.

Not to mention my headache trying to catalog all of the different x86 float ABIs.

Given that we are also discussing making SSE/SSE2 mandatory for i686 targets with SSE2 as part of their baseline features, I think we should forge ahead, because this actually makes it more like x86-64's ABI.

@@ -371,6 +371,8 @@ fn fn_abi_new_uncached<'tcx>(
let target = &cx.tcx.sess.target;
let target_env_gnu_like = matches!(&target.env[..], "gnu" | "musl" | "uclibc");
let win_x64_gnu = target.os == "windows" && target.arch == "x86_64" && target.env == "gnu";
let x86_sse2 =
target.arch == "x86" && (target.features.contains("sse2") || target.cpu == "pentium4");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect you'd want to query the sse2 target feature rather than checking for the pentium4 cpu, since e.g. penryn also supports sse2 but isn't the pentium4. https://rust.godbolt.org/z/j7oPE6Wrf

also, what if there's a soft-float target for an OS kernel where sse2 would be disabled even if the cpu is pentium4 due to the kernel not saving SSE registers?

Copy link
Member

@workingjubilee workingjubilee Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuentherVIII If you handle the aforementioned edge-cases appropriately, I believe we can get this approved. We want to only do this if the target is a 32-bit x86 CPU that has "baseline" features including SSE2, and preferably only with the function also satisfying the appropriate target feature (which we wish to make "always" but for now we can afford to check twice for correctness). This obviously includes Pentium 4 CPUs with no other target feature settings, but it won't include any that, say, are softfloat by default.

Copy link
Member

@programmerjake programmerjake Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, in other words, only if on x86-32 and SSE2 is implied by the target structs/json ignoring any -C target-features or -C target-cpu or other overrides.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does rust currently ask LLVM for the target features implied by the target cpu? I could only find the code that creates the list for #[cfg(target_feature)], which takes command-line options into account.

I've changed the implementation to check that list instead of target.features and target.cpu. This has the advantage that the standard library can activate the NAN-returning test precisely when it passes, and deactivating SSE with -C target-features still works.

There's also a second commit introducing yet another target option to make this configurable in the target specification and enabling it in all "i686" targets except i686_unknown_uefi, which disables SSE. The standard

I do not think we should make the ABI dependant on #[target_feature] - unlike -C target-feature, the attribute cannot disable SSE. If someone disables SSE with -C target-feature and re-enables it for select functions with #[target_feature], this shouldn't result in incompatible function ABIs with the same function type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue you'll run into with -C target-feature=-sse is that functions in your binary won't match the ABI of other code (such as std) with SSE2 enabled in the default target features

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's talk about forbidding going below the baseline. If you want less-than-baseline features you need a different target.

https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Float.20ABI.20hell/near/394473707

@RalfJung
Copy link
Member

RalfJung commented Oct 3, 2023

Rustc already has ABI issues with floats and -x87, adding +sse to the list of features replied upon for abi is just going to make that problem worse.

I think it just means we want to have both of these in the set of baseline features.

But it means we should maybe block this PR on having the concept of "baseline features" rolled out so people get at least a warning when they disable a feature that should not be disabled?

@thomcc
Copy link
Member

thomcc commented Oct 4, 2023

r? compiler

@rustbot rustbot assigned b-naber and unassigned thomcc Oct 4, 2023
…tion

i686 already uses SSE2 to do calculations with f32 and f64, but the C
calling convention uses the x87 stack to return values. The Rust calling
convention does not need to do this, and LLVM makes it easy to use XMM0
instead, which saves move instructions and fixes problems with NaN values.
@rustbot
Copy link
Collaborator

rustbot commented Oct 8, 2023

These commits modify compiler targets.
(See the Target Tier Policy.)

@rust-log-analyzer

This comment has been minimized.

Activating SSE2 with -C target-feature doesn't change the Rust ABI anymore,
while deactivating it still does.
@b-naber
Copy link
Contributor

b-naber commented Oct 13, 2023

r? compiler

@rustbot rustbot assigned petrochenkov and unassigned b-naber Oct 13, 2023
@petrochenkov
Copy link
Contributor

This should probably be blocked on #116584.
@rustbot blocked

@rustbot rustbot added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 13, 2023
@Noratrieb Noratrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
@bors
Copy link
Contributor

bors commented Nov 8, 2023

☔ The latest upstream changes (presumably #117716) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov
Copy link
Contributor

Closing because #116584 was closed.

@RalfJung
Copy link
Member

RalfJung commented Jul 16, 2024

This got superseded by #123351 which avoids the ABI issue by returning things in-memory.

@programmerjake
Copy link
Member

This got superseded by #123351 which avoids the ABI issue by returning things in-memory.

returning stuff in xmm0 is still better since it avoids extra load/store, so I think we still want this PR (or similar) but it's not as critical since the corrupting-float-returns bug is now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-abi Area: Concerning the application binary interface (ABI). A-floating-point Area: Floating point numbers and arithmetic O-x86_32 Target: x86 processors, 32 bit (like i686-*) S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.