Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow code generated for memcpy of a zext i1 #30349

Open
llvmbot opened this issue Nov 13, 2016 · 1 comment
Open

Slow code generated for memcpy of a zext i1 #30349

llvmbot opened this issue Nov 13, 2016 · 1 comment
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2016

Bugzilla Link 31001
Version 3.9
OS Linux
Reporter LLVM Bugzilla Contributor
CC @majnemer,@efriedma-quic,@hfinkel,@jrmuizel,@rotateright

Extended Description

A memcpy with a constant length is lowered to a (fast) sequence of load and store instructions. A memcpy with a non-constant length is lowered to a call to the memcpy function, which is slow for short copies.

For example, a memcpy of a zext i1 is equivalent to a conditional load and store of a single byte, but the generated IR (and ASM) contains a call to memcpy:

declare void @​llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #​2

define i8 @​test_load_store(i1 %cond, i8* %buf) {
  %_result = alloca i8, align 8
  %_len = zext i1 %cond to i64
  store i8 0, i8* %_result
  call void @​llvm.memcpy.p0i8.p0i8.i64(i8* nonnull %_result, i8* nonnull %buf, i64 %_len, i32 1, i1 false)
  %_ret = load i8, i8* %_result
  ret i8 %_ret
}

This causes slowness in Rust's Cursor::read, which we discovered in PR rust-lang/rust#37573..

@efriedma-quic
Copy link
Collaborator

The general transformation here is turning "memcpy(a, b, cond ? c : d);" into "if (cond) memcpy(a, b, c); else memcpy(a, b, d);". The hard part is figuring out when it's profitable; this transform just bloats the code unless it simplifies somehow. Ways it can simplify:

  • If c or d is zero, one of the memcpys goes away.
  • If c or d is constant, alias analysis becomes more accurate (this is more papering over a weakness than an actual benefit, though)
  • If c or d is constant, we can potentially hoist loads across one or both paths.
  • If c or d is a small constant, we can inline one or both memcpys.
  • If a or b is an alloca, and c and d are constant, we can potentially unblock SROA.

Maybe we could perform this transform in memcpyopt? Or we could try to do something very late, in CodeGenPrepare, just to allow inlining the memcpy. It's hard to gauge what's appropriate because this sort of code is very rare in C and C++, as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants