Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "consume" loads on PowerPC #901

Closed
wants to merge 1 commit into from

Conversation

taylordotfish
Copy link

Like ARM, PowerPC is also a weakly ordered architecture where "acquire" loads are more expensive than "consume" loads, which require no special instructions.

Like ARM, PowerPC is also a weakly ordered architecture where
"acquire" loads are more expensive than "consume" loads, which
require no special instructions.
@taiki-e
Copy link
Member

taiki-e commented Aug 29, 2022

Thanks for the PR! Unfortunately, the resulting code is less efficient than acquire load that uses isync + branching because LLVM uses lwsync for powerpc's compiler_fence: https://godbolt.org/z/YG5M5foE4

EDIT: Depending on the CPU, lwsync and isync+branching may be equally efficient, but that is an LLVM issue regarding how to lower acquire load.

@taylordotfish
Copy link
Author

@taiki-e Huh… but why is compiler_fence issuing any kind of *sync at all? Based on the documentation, I would've expected it not to generate any special instructions:

compiler_fence does not emit any machine code, but restricts the kinds of memory re-ordering the compiler is allowed to do.

But empirically, that's clearly not the case…

@taiki-e
Copy link
Member

taiki-e commented Aug 30, 2022

compiler_fence's documentation is incorrect, see rust-lang/rust#62256 for more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants