Slow code generated for memcpy of a `zext i1` #30349

llvmbot · 2016-11-13T01:37:13Z


Bugzilla Link	31001
Version	3.9
OS	Linux
Reporter	LLVM Bugzilla Contributor
CC	@majnemer,@efriedma-quic,@hfinkel,@jrmuizel,@rotateright

Extended Description

A memcpy with a constant length is lowered to a (fast) sequence of load and store instructions. A memcpy with a non-constant length is lowered to a call to the memcpy function, which is slow for short copies.

For example, a memcpy of a zext i1 is equivalent to a conditional load and store of a single byte, but the generated IR (and ASM) contains a call to memcpy:

declare void @&#8203;llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #&#8203;2

define i8 @&#8203;test_load_store(i1 %cond, i8* %buf) {
  %_result = alloca i8, align 8
  %_len = zext i1 %cond to i64
  store i8 0, i8* %_result
  call void @&#8203;llvm.memcpy.p0i8.p0i8.i64(i8* nonnull %_result, i8* nonnull %buf, i64 %_len, i32 1, i1 false)
  %_ret = load i8, i8* %_result
  ret i8 %_ret
}

This causes slowness in Rust's Cursor::read, which we discovered in PR rust-lang/rust#37573..

The text was updated successfully, but these errors were encountered:

efriedma-quic · 2016-12-02T02:35:18Z

The general transformation here is turning "memcpy(a, b, cond ? c : d);" into "if (cond) memcpy(a, b, c); else memcpy(a, b, d);". The hard part is figuring out when it's profitable; this transform just bloats the code unless it simplifies somehow. Ways it can simplify:

If c or d is zero, one of the memcpys goes away.
If c or d is constant, alias analysis becomes more accurate (this is more papering over a weakness than an actual benefit, though)
If c or d is constant, we can potentially hoist loads across one or both paths.
If c or d is a small constant, we can inline one or both memcpys.
If a or b is an alloca, and c and d are constant, we can potentially unblock SROA.

Maybe we could perform this transform in memcpyopt? Or we could try to do something very late, in CodeGenPrepare, just to allow inlining the memcpy. It's hard to gauge what's appropriate because this sort of code is very rare in C and C++, as far as I know.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

scottmcm mentioned this issue May 3, 2023

Write on &mut [u8] and Cursor<&mut [u8]> doesn't optimize very well. rust-lang/rust#44099

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow code generated for memcpy of a `zext i1` #30349

Slow code generated for memcpy of a `zext i1` #30349

llvmbot commented Nov 13, 2016

efriedma-quic commented Dec 2, 2016

Slow code generated for memcpy of a zext i1 #30349

Slow code generated for memcpy of a zext i1 #30349

Comments

llvmbot commented Nov 13, 2016

Extended Description

efriedma-quic commented Dec 2, 2016

Slow code generated for memcpy of a `zext i1` #30349

Slow code generated for memcpy of a `zext i1` #30349