-
-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ABIRewrite system for lvalues #1528
Conversation
@@ -39,16 +39,17 @@ class FunctionType; | |||
struct ABIRewrite { | |||
virtual ~ABIRewrite() = default; | |||
|
|||
/// get a rewritten value back to its original form | |||
virtual llvm::Value *get(Type *dty, llvm::Value *v) = 0; | |||
/// Transforms the D argument to a suited LL argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"suitable"? I'm not a native speaker either, but "suited" as an adjective conjures weird images of LLVM values wearing suits in my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh yes of course ;)
LLValue *mem = DtoAlloca(returntype); | ||
irFty.getRet(returntype, retllval, mem); | ||
retllval = mem; | ||
retllval = irFty.getRetLVal(returntype, retllval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your new interface, it is still necessary to have both the getRetLVal
branch and the "manual" storeReturnValueOnStack
branch below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's still a mess to sort out. As is all the repainting going on for return values and parameters...
Passes the full debug test suite on my Win64 box. |
return new DVarValue(type, getIrValue(vd)); | ||
} | ||
if (llvm::isa<llvm::Argument>(getIrValue(vd))) { | ||
return new DImValue(type, getIrValue(vd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took quite some time to get my attention. A delegate rewritten by ExplicitByvalRewrite (to an lvalue, a LL pointer to the delegate struct) would be bound to the parameter as DImValue
, and it wouldn't get dereferenced properly later on. That was because the pointer is now the actual argument itself instead of a private alloca (so llvm::isa<llvm::Argument>()
returns true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as I recently discovered too (cf. all the return value instruction domination issues), there are a lot of places like this that confuse IR-level semantics with whatever confused concepts of lvalues/rvalues we have on the DValue layer.
Allow ABIRewrites to return the D parameter's LL value directly. Most rewrites store to memory anyway, so let the D parameter point directly to that memory instead of a dedicated alloca bitcopy.
It would be nice if someone could verify that the ARM ABIs still work too (@smolt, @redstar, @joakim-noah). |
I am currently useless. All my stuff is in boxes as I am moving. |
Applied this PR to master and cross-compiled for Android/ARM, all the same standard library tests pass. |
@@ -168,7 +168,7 @@ void DtoPaddedStruct(Type *dty, LLValue *v, LLValue *lval) { | |||
// Nested structs are the only members that can contain padding | |||
DtoPaddedStruct(fields[i]->type, fieldval, fieldptr); | |||
} else { | |||
DtoStore(fieldval, fieldptr); | |||
DtoStoreZextI8(fieldval, fieldptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug (unable to store i1
bits) appears to have been the reason why the return value of intrinsics was never rewritten by the ABI (intrinsics ABI in this case) in gen/tocall.cpp
.
The return value of intrinsics returning a LL struct (e.g., llvm.sadd.with.overflow
) therefore didn't undergo the usual ABI transformation process; in this case, the LL struct was simply dumped to memory instead of going through DtoPaddedStruct()
via the RemoveStructPadding
rewrite which is otherwise applied to all LL structs passed to (and now also returned by) intrinsics.
Thanks @joakim-noah for the ARM tests. |
This is the effect on the unoptimized IR for Win64 ( struct Struct { long a, b; }
Struct foo(Struct s) { return s; }
void main()
{
auto s = Struct(1, 2);
foo(s);
} // old:
define void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.sret_arg, %byval.Struct* noalias nocapture align 16 %s_arg) #0 comdat {
%s = alloca %byval.Struct, align 8 ; [#uses = 2, size/byte = 16]
%1 = bitcast %byval.Struct* %s to i8* ; [#uses = 1]
%2 = bitcast %byval.Struct* %s_arg to i8* ; [#uses = 1]
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 16, i32 1, i1 false)
%3 = bitcast %byval.Struct* %.sret_arg to i8* ; [#uses = 1]
%4 = bitcast %byval.Struct* %s to i8* ; [#uses = 1]
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %4, i64 16, i32 1, i1 false)
ret void
}
// new:
define void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.sret_arg, %byval.Struct* noalias nocapture align 16 %s) #0 comdat {
%1 = bitcast %byval.Struct* %.sret_arg to i8* ; [#uses = 1]
%2 = bitcast %byval.Struct* %s to i8* ; [#uses = 1]
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %2, i64 16, i32 1, i1 false)
ret void
}
// common:
define i32 @_Dmain({ i64, { i64, i8* }* } %unnamed) #0 comdat {
%s = alloca %byval.Struct, align 8 ; [#uses = 2, size/byte = 16]
%.structliteral = alloca %byval.Struct, align 8 ; [#uses = 3, size/byte = 16]
%.rettmp = alloca %byval.Struct, align 8 ; [#uses = 1, size/byte = 16]
%.ExplicitByvalRewrite_dump = alloca %byval.Struct, align 16 ; [#uses = 2, size/byte = 16]
// initialize .structliteral with (1,2)
// memcpy .structliteral to .s
// memcpy .s to .ExplicitByvalRewrite_dump
call void @_D5byval3fooFS5byval6StructZS5byval6Struct(%byval.Struct* noalias sret align 8 %.rettmp, %byval.Struct* noalias nocapture align 16 %.ExplicitByvalRewrite_dump) #0
ret i32 0
} This useless additional copy by the callee has been bugging me for quite a while. |
For ARM, test-suite is where ABI better is tested. @joakim-noah did you run test-suite also? |
No, I'll try that next. |
I'd rather amend/revert this change afterwards if it happens to break ARM. (Merge/CI speed is currently one of our biggest bottle necks, unfortunately.) |
Built and ran dmd testsuite natively on Android/ARM with this PR, identical results. |
Nice to hear, thanks! |
One more plus for ARM. Raspberry Pi 3 and master (with this merged PR) passes all tests, except lit-tests don't run correctly due to my setup. I have a space in a USB HD label and that dorks up the lit command lines. |
Can you share some details? Sounds like a bug in the way I set up lit. |
Sure, I'll paste in the error when I get home today. |
@JohanEngelen lit-test issue #1536 (which I am sure github emailed you about too) |
Allow ABIRewrites to return the D parameter's LL lvalue directly.
Most rewrites store to memory anyway, so let the D parameter point directly to that memory instead of a dedicated alloca bitcopy.