Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal machine code mapping results or options #117551

Open
davidzeng0 opened this issue Nov 3, 2023 · 4 comments
Open

Suboptimal machine code mapping results or options #117551

davidzeng0 opened this issue Nov 3, 2023 · 4 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@davidzeng0
Copy link
Contributor

https://godbolt.org/z/GjrE4GjhK

for all 3 maybe_with_data functions, the move of certain values on the stack is unnecessary.

This seems like a fundamental issue and probably won't be fixed soon, but it should make
all existing code run slightly faster

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 3, 2023
@clubby789 clubby789 added I-slow Issue: Problems and improvements with respect to performance of generated code. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 3, 2023
@saethlin
Copy link
Member

saethlin commented Nov 4, 2023

The behavior here is specific to array lengths that are not a power of 2. I think this is a consequence of #115236, the code looks quite well-optimized if the array length is 4 or 8.

@davidzeng0
Copy link
Contributor Author

davidzeng0 commented Nov 4, 2023

I'd say that's still a problem, because any moves to stack shouldn't happen in the first place

I have the following structs

pub struct Packet {
	buffer: BufferRef, /* 3 usizes */
	data: *mut u8,
	size: usize,

	pub time_base: Rational, /* u32, u32 */
	pub duration: u64,
	pub timestamp: i64,
	pub track_index: u32,
	pub flags: BitFlags<PacketFlag>,
	pub zero_padding: usize
}

pub struct Element {
	pub id: ElementId, /* u64 */
	pub size: u64,
	pub offset: u64,
	pub end: u64
}

and multiple non-inline and inline calls to read(&mut self) -> Result<Option<Packet / Element>>

I mainly use maybe_with_data's match statement rather than as_mut().map() (which doesn't work either) to reduce the 1 additional indentation in my code

Even though the function call is inlined, I see multiple mov reg, [rsp + 0x300] and mov [rsp + 0x110], reg instructions copying the data between different places on the stack, at my match statement succeeding

Especially when my code does around 100 nanos per packet, those extra vectorized and nonvectorized moves add significant overhead that I shouldn't get

I'm only covering Option and Result in this issue, but it should also apply to enums and any function that invites rustc to move a value, even when it's unnecessary. It seems to be related to #99504

@davidzeng0
Copy link
Contributor Author

davidzeng0 commented Nov 4, 2023

Here is clang, with similar code to an Option in C++ https://godbolt.org/z/aPv9nPbn6. There is no mention of rsp at all, until you uncomment the inline never attribute in which case all optimizations go out the window. Gcc produces similar code to clang

@saethlin
Copy link
Member

saethlin commented Nov 4, 2023

Yes it's still a problem, but I would say it is a problem with arrays, not with map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants