Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ABIRewrite system for lvalues #1528

Merged
merged 5 commits into from
May 29, 2016

Conversation

kinke
Copy link
Member

@kinke kinke commented May 28, 2016

Allow ABIRewrites to return the D parameter's LL lvalue directly.
Most rewrites store to memory anyway, so let the D parameter point directly to that memory instead of a dedicated alloca bitcopy.

@@ -39,16 +39,17 @@ class FunctionType;
struct ABIRewrite {
virtual ~ABIRewrite() = default;

/// get a rewritten value back to its original form
virtual llvm::Value *get(Type *dty, llvm::Value *v) = 0;
/// Transforms the D argument to a suited LL argument.
Copy link
Member

@dnadlinger dnadlinger May 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"suitable"? I'm not a native speaker either, but "suited" as an adjective conjures weird images of LLVM values wearing suits in my head.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh yes of course ;)

LLValue *mem = DtoAlloca(returntype);
irFty.getRet(returntype, retllval, mem);
retllval = mem;
retllval = irFty.getRetLVal(returntype, retllval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your new interface, it is still necessary to have both the getRetLVal branch and the "manual" storeReturnValueOnStack branch below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still a mess to sort out. As is all the repainting going on for return values and parameters...

@kinke
Copy link
Member Author

kinke commented May 28, 2016

Passes the full debug test suite on my Win64 box.

return new DVarValue(type, getIrValue(vd));
}
if (llvm::isa<llvm::Argument>(getIrValue(vd))) {
return new DImValue(type, getIrValue(vd));
Copy link
Member Author

@kinke kinke May 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took quite some time to get my attention. A delegate rewritten by ExplicitByvalRewrite (to an lvalue, a LL pointer to the delegate struct) would be bound to the parameter as DImValue, and it wouldn't get dereferenced properly later on. That was because the pointer is now the actual argument itself instead of a private alloca (so llvm::isa<llvm::Argument>() returns true).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I recently discovered too (cf. all the return value instruction domination issues), there are a lot of places like this that confuse IR-level semantics with whatever confused concepts of lvalues/rvalues we have on the DValue layer.

Allow ABIRewrites to return the D parameter's LL value directly.
Most rewrites store to memory anyway, so let the D parameter point
directly to that memory instead of a dedicated alloca bitcopy.
@kinke
Copy link
Member Author

kinke commented May 28, 2016

It would be nice if someone could verify that the ARM ABIs still work too (@smolt, @redstar, @joakim-noah).

@smolt
Copy link
Member

smolt commented May 29, 2016

I am currently useless. All my stuff is in boxes as I am moving.

@joakim-noah
Copy link
Contributor

Applied this PR to master and cross-compiled for Android/ARM, all the same standard library tests pass.

@@ -168,7 +168,7 @@ void DtoPaddedStruct(Type *dty, LLValue *v, LLValue *lval) {
// Nested structs are the only members that can contain padding
DtoPaddedStruct(fields[i]->type, fieldval, fieldptr);
} else {
DtoStore(fieldval, fieldptr);
DtoStoreZextI8(fieldval, fieldptr);
Copy link
Member Author

@kinke kinke May 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug (unable to store i1 bits) appears to have been the reason why the return value of intrinsics was never rewritten by the ABI (intrinsics ABI in this case) in gen/tocall.cpp.
The return value of intrinsics returning a LL struct (e.g., llvm.sadd.with.overflow) therefore didn't undergo the usual ABI transformation process; in this case, the LL struct was simply dumped to memory instead of going through DtoPaddedStruct() via the RemoveStructPadding rewrite which is otherwise applied to all LL structs passed to (and now also returned by) intrinsics.

@kinke
Copy link
Member Author

kinke commented May 29, 2016

Thanks @joakim-noah for the ARM tests.

@kinke
Copy link
Member Author

kinke commented May 29, 2016

This is the effect on the unoptimized IR for Win64 (ExplicitByvalRewrite):

struct Struct { long a, b; }

Struct foo(Struct s) { return s; }

void main()
{
    auto s = Struct(1, 2);
    foo(s);
}
// old:
define void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.sret_arg, %byval.Struct* noalias nocapture align 16 %s_arg) #0 comdat {
  %s = alloca %byval.Struct, align 8              ; [#uses = 2, size/byte = 16]
  %1 = bitcast %byval.Struct* %s to i8*           ; [#uses = 1]
  %2 = bitcast %byval.Struct* %s_arg to i8*       ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 16, i32 1, i1 false)
  %3 = bitcast %byval.Struct* %.sret_arg to i8*   ; [#uses = 1]
  %4 = bitcast %byval.Struct* %s to i8*           ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %4, i64 16, i32 1, i1 false)
  ret void
}

// new:
define void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.sret_arg, %byval.Struct* noalias nocapture align 16 %s) #0 comdat {
  %1 = bitcast %byval.Struct* %.sret_arg to i8*   ; [#uses = 1]
  %2 = bitcast %byval.Struct* %s to i8*           ; [#uses = 1]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 16, i32 1, i1 false)
  ret void
}

// common:
define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 comdat {
  %s = alloca %byval.Struct, align 8              ; [#uses = 2, size/byte = 16]
  %.structliteral = alloca %byval.Struct, align 8 ; [#uses = 3, size/byte = 16]
  %.rettmp = alloca %byval.Struct, align 8        ; [#uses = 1, size/byte = 16]
  %.ExplicitByvalRewrite_dump = alloca %byval.Struct, align 16 ; [#uses = 2, size/byte = 16]
  // initialize .structliteral with (1,2)
  // memcpy .structliteral to .s
  // memcpy .s to .ExplicitByvalRewrite_dump
  call void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.rettmp, %byval.Struct* noalias nocapture align 16 %.ExplicitByvalRewrite_dump) #0
  ret i32 0
}

This useless additional copy by the callee has been bugging me for quite a while.

@smolt
Copy link
Member

smolt commented May 29, 2016

For ARM, test-suite is where ABI better is tested. @joakim-noah did you run test-suite also?

@joakim-noah
Copy link
Contributor

No, I'll try that next.

@dnadlinger
Copy link
Member

dnadlinger commented May 29, 2016

I'd rather amend/revert this change afterwards if it happens to break ARM. (Merge/CI speed is currently one of our biggest bottle necks, unfortunately.)

@dnadlinger dnadlinger merged commit 67c0d97 into ldc-developers:master May 29, 2016
@joakim-noah
Copy link
Contributor

joakim-noah commented May 29, 2016

Built and ran dmd testsuite natively on Android/ARM with this PR, identical results.

@dnadlinger
Copy link
Member

dnadlinger commented May 29, 2016

Nice to hear, thanks!

@smolt
Copy link
Member

smolt commented May 30, 2016

One more plus for ARM. Raspberry Pi 3 and master (with this merged PR) passes all tests, except lit-tests don't run correctly due to my setup. I have a space in a USB HD label and that dorks up the lit command lines.

@JohanEngelen
Copy link
Member

except lit-tests don't run correctly due to my setup. I have a space in a USB HD label and that dorks up the lit command lines.

Can you share some details? Sounds like a bug in the way I set up lit.

@smolt
Copy link
Member

smolt commented May 31, 2016

Sure, I'll paste in the error when I get home today.

@smolt
Copy link
Member

smolt commented Jun 1, 2016

@JohanEngelen lit-test issue #1536 (which I am sure github emailed you about too)

@kinke kinke deleted the abi_cleanup branch August 24, 2017 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants