Skip to content

std.math.trigonometry: fix tan asm x86 implementation for ldc compiler#8199

Merged
RazvanN7 merged 1 commit intodlang:masterfrom
ljmf00:fix-asm-tan-impl
Aug 24, 2021
Merged

std.math.trigonometry: fix tan asm x86 implementation for ldc compiler#8199
RazvanN7 merged 1 commit intodlang:masterfrom
ljmf00:fix-asm-tan-impl

Conversation

@ljmf00
Copy link
Member

@ljmf00 ljmf00 commented Aug 19, 2021

Somehow, separating asm blocks in small pieces produces additional
movs and other instructions on LDC that breaks the tanAsm logic and
therefore returns wrong values.

Signed-off-by: Luís Ferreira contact@lsferreira.net


CC: @kinke

I'm pretty sure this is not being tested on LDC. I would love to see some coverage reporting for LDC builds. A comparison of DMD and LDC coverage is desired to seek some problems on LDC code generation.

Here's the problem isolated: https://godbolt.org/z/bEhGa1d9P

Here is the asm diff of this change for more context:

--- t1  2021-08-19 19:21:19.595248484 +0100
+++ t2  2021-08-19 19:20:58.704874438 +0100
@@ -5,8 +5,9 @@
         mov     rbp, rsp
         fld     tbyte ptr [rbp + 16]
         fstp    tbyte ptr [rbp - 16]
+        fld     dword ptr [rip + .LCPI0_0]
+        fstp    tbyte ptr [rbp - 32]
         fld     tbyte ptr [rbp - 16]
-        mov     dword ptr [rbp - 20], 0
         fxam
         wait
         fnstsw  ax
@@ -32,41 +33,12 @@
         test    ah, 4
         je      .Lpure nothrow @nogc @trusted real example.tanAsm(real)_Lret
         fstp    st(0)
-
-        jmp     .Lpure nothrow @nogc @trusted real example.tanAsm(real)__llvm_asm_end1
+        fld     tbyte ptr [rbp - 32]
 .Lpure nothrow @nogc @trusted real example.tanAsm(real)_Clear1:
-        mov     dword ptr [rbp - 20], 1
-
-        jmp     .Lpure nothrow @nogc @trusted real example.tanAsm(real)__llvm_asm_end1
+        fstp    st(0)
 .Lpure nothrow @nogc @trusted real example.tanAsm(real)_Lret:
-        mov     dword ptr [rbp - 20], 2
-
-        jmp     .Lpure nothrow @nogc @trusted real example.tanAsm(real)__llvm_asm_end1
-.Lpure nothrow @nogc @trusted real example.tanAsm(real)__llvm_asm_end1:
-
-        mov     eax, dword ptr [rbp - 20]
-        mov     ecx, eax
-        sub     ecx, 1
-        mov     dword ptr [rbp - 24], eax
-        je      .LBB0_1
-        jmp     .LBB0_6
-.LBB0_6:
-        mov     eax, dword ptr [rbp - 24]
-        sub     eax, 2
-        je      .LBB0_2
-        jmp     .LBB0_3
-.LBB0_1:
-        jmp     .LBB0_4
-.LBB0_2:
-        jmp     .LBB0_5
-.LBB0_3:
-        fld     dword ptr [rip + .LCPI0_0]
-        pop     rbp
-        ret
-.LBB0_4:
+        fld     st(0)
         fstp    st(0)
-.LBB0_5:
-        fldz
         pop     rbp
         ret
 
@@ -180,7 +152,7 @@
 .LBB1_8:
         lea     rsi, [rip + .L.str.4]
         mov     edi, 14
-        mov     edx, 120
+        mov     edx, 117
         call    _d_assert@PLT
 
 @safe void std.stdio.writeln!(immutable(char)[], real).writeln(immutable(char)[], real):

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @ljmf00! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8199"

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 19, 2021

@kinke meanwhile I found out that declaring rnan as an enum makes the compiler abort abnormally with the following stack trace.

 #0 0x00007fb59b59b793 (/usr/lib/libLLVM-12.so+0xb49793)
 #1 0x00007fb59b598e96 (/usr/lib/libLLVM-12.so+0xb46e96)
 #2 0x00007fb59aa3d870 __restore_rt (/usr/lib/libpthread.so.0+0x13870)
 #3 0x0000562b67d80b47 getIrValue(VarDeclaration*) (/usr/bin/ldc2+0x74bb47)
 #4 0x0000562b67d02509 makeVarDValue(Type*, VarDeclaration*, llvm::Value*) (/usr/bin/ldc2+0x6cd509)
 #5 0x0000562b67d029fd DtoSymbolAddress(Loc const&, Type*, Declaration*) (/usr/bin/ldc2+0x6cd9fd)
 #6 0x0000562b67d5b11a (/usr/bin/ldc2+0x72611a)
 #7 0x0000562b67d51448 toElem(Expression*) (/usr/bin/ldc2+0x71c448)
 #8 0x0000562b67d5b2d0 (/usr/bin/ldc2+0x7262d0)
 #9 0x0000562b67d51448 toElem(Expression*) (/usr/bin/ldc2+0x71c448)
#10 0x0000562b67cd1ab8 AsmStatement_toIR(InlineAsmStatement*, IRState*) (/usr/bin/ldc2+0x69cab8)
#11 0x0000562b67d39dfb Statement_toIR(Statement*, IRState*) (/usr/bin/ldc2+0x704dfb)
#12 0x0000562b67cd33a3 CompoundAsmStatement_toIR(CompoundAsmStatement*, IRState*) (/usr/bin/ldc2+0x69e3a3)
#13 0x0000562b67d3a4e9 (/usr/bin/ldc2+0x7054e9)
#14 0x0000562b67d3a4e9 (/usr/bin/ldc2+0x7054e9)
#15 0x0000562b67d39dfb Statement_toIR(Statement*, IRState*) (/usr/bin/ldc2+0x704dfb)
#16 0x0000562b67cf2584 DtoDefineFunction(FuncDeclaration*, bool) (/usr/bin/ldc2+0x6bd584)
#17 0x0000562b67dc0de0 (/usr/bin/ldc2+0x78bde0)
#18 0x0000562b67dc0de0 (/usr/bin/ldc2+0x78bde0)
#19 0x0000562b67dc0d32 Declaration_codegen(Dsymbol*) (/usr/bin/ldc2+0x78bd32)
#20 0x0000562b67d09772 codegenModule(IRState*, Module*) (/usr/bin/ldc2+0x6d4772)
#21 0x0000562b67dd2f98 ldc::CodeGenerator::emit(Module*) (/usr/bin/ldc2+0x79df98)
#22 0x0000562b67da74d9 codegenModules(Array<Module*>&) (/usr/bin/ldc2+0x7724d9)
#23 0x0000562b67b75d18 mars_mainBody(Param&, Array<char const*>&, Array<char const*>&) (/usr/bin/ldc2+0x540d18)
#24 0x0000562b67da5a02 cppmain() (/usr/bin/ldc2+0x770a02)
#25 0x0000562b67f3be28 _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv (/usr/bin/ldc2+0x906e28)
#26 0x0000562b67f3bc45 _d_run_main2 (/usr/bin/ldc2+0x906c45)
#27 0x0000562b67f3ba9e _d_run_main (/usr/bin/ldc2+0x906a9e)
#28 0x0000562b67a7784e main (/usr/bin/ldc2+0x44284e)
#29 0x00007fb59a50eb25 __libc_start_main (/usr/lib/libc.so.6+0x27b25)
#30 0x0000562b67a7958e _start (/usr/bin/ldc2+0x44458e)

@ljmf00 ljmf00 marked this pull request as ready for review August 19, 2021 19:46
@ljmf00 ljmf00 requested a review from ibuclaw as a code owner August 19, 2021 19:46
@thewilsonator thewilsonator requested a review from kinke August 19, 2021 23:12
Copy link
Member

@maxhaton maxhaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test case, i.e. What wrong value? Would also be good to make sure that there is coverage for exactly this implementation rather than just the top level tan

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

Needs a test case, i.e. What wrong value? Would also be good to make sure that there is coverage for exactly this implementation rather than just the top level tan

This behaviour is already being tested and the assembly version is being covered, but unfortunately, coverage reporting doesn't work with asm blocks. The unittests fail when the proper environment is set.

AFAIK phobos doesn't run LDC test cases at all so the problem needs to be fixed on the LDC test suite, that's why I tagged @kinke to be aware of this problem.

@maxhaton
Copy link
Member

Needs a test case, i.e. What wrong value? Would also be good to make sure that there is coverage for exactly this implementation rather than just the top level tan

This behaviour is already being tested and the assembly version is being covered, but unfortunately, coverage reporting doesn't work with asm blocks. The unittests fail when the proper environment is set.

AFAIK phobos doesn't run LDC test cases at all so the problem needs to be fixed on the LDC test suite, that's why I tagged @kinke to be aware of this problem.

OK, but what does "fail" mean, does the moon fall out of the sky or does it return wrong values? Nan values?

This would also benefit from a comment in the code noting the change

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

OK, but what does "fail" mean, does the moon fall out of the sky or does it return wrong values? Nan values?

It returns wrong values on LDC. More precisely, 0 when not fed with infinity.

The point is: this is already being tested. I don't see the point of creating more test cases. The failing point here is not compiling the tests with x87 floating point precision on the LDC test suite. The DMD and official Phobos test suite are fine and cover this but not the LDC ones.

This would also benefit from a comment in the code noting the change

Can you suggest something? I wrote a brief description on the commit body in case someone does a git blame.

@kinke
Copy link
Contributor

kinke commented Aug 20, 2021

I'm pretty sure this is not being tested on LDC.

You're right, what is being tested is LDC's fork of Phobos, which already includes a workaround for this: https://github.com/ldc-developers/phobos/blob/e5258ae22258cf839790d267a281c1237453b3c0/std/math/trigonometry.d#L214-L315

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

You're right, what is being tested is LDC's fork of Phobos, which already includes a workaround for this: ldc-developers/phobos@e5258ae/std/math/trigonometry.d#L214-L315

Why not upstream these changes?

@kinke
Copy link
Contributor

kinke commented Aug 20, 2021

This is just one of a bunch of std.math adaptations for LDC, see e.g. ldc-developers@b3efcea. This one is a particularly ugly workaround for LDC limitations wrt. DMD-style inline asm; others don't make much too much sense upstream IMO, just uglifying Phobos with LDC specifics (and further if the GDC-specifica would be upstreamed too).

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

This is just one of a bunch of std.math adaptations for LDC, see e.g. ldc-developers@b3efcea. This one is a particularly ugly workaround for LDC limitations wrt. DMD-style inline asm; others don't make much too much sense upstream IMO, just uglifying Phobos with LDC specifics (and further if the GDC-specifica would be upstreamed too).

There are some things I already fixed here in the upstream, like relying on the runtime function when possible. fabs, log, log10, ... are seen as builtins by the compiler and fallback to the runtime implementation. Other stuff is nasty workarounds that impact the overall coverage of the code.

I'm not totally sure, but those patches seem to lack code review because there are better options to go with. E.g. ldc-developers@b3efcea#diff-b7811717fbf1cc1b4465a9a51fdd2bf5e1ee6cdfe51f3d77deb3943ce0f941daR3394-R3396 this decreases binary compatibility and does not actually follow Phobos. It may compile fine, but not portable when switching binaries at runtime. Another stuff that clearly isn't good is disabling unittests.

These are the type of things that makes LDC adoption more breakable in a production-ready environment, and we all know that everyone uses LDC for release code.

I don't see the problem of including some LDC and GDC specifics in the standard library if they are sane changes. Maintaining three different repositories are not productive in a long run.

@kinke
Copy link
Contributor

kinke commented Aug 20, 2021

I'm not totally sure, but those patches seem to lack code review because there are better options to go with. E.g. ldc-developers/phobos@b3efcea#diff-b7811717fbf1cc1b4465a9a51fdd2bf5e1ee6cdfe51f3d77deb3943ce0f941daR3394-R3396 this decreases binary compatibility and does not actually follow Phobos.

Decreases binary compatibility in what way? The signature is identical, are you talking about the pragma(inline) to prevent the costly Phobos trampoline? Unlike DMD, the other compilers have to deal with real likely not being X87 (incl. Windows targets and all non-x86 targets). There are other DMD beauties like https://github.com/ldc-developers/phobos/blob/b3efcea59dfa11d752f5df6ba2d6871748dbefc7/std/math/exponential.d#L3123, implying less precision than most other x87 implementations.

Feel free to try to improve the existing implementations; but note that testing them is a PITA for non-LDC devs like you, as our Phobos fork doesn't have CI, and needs to be tested by opening an LDC PR and advancing the Phobos submodule accordingly.

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

Decreases binary compatibility in what way? The signature is identical, (...)

https://run.dlang.io/is/CCcCT2

I just realized that those overloads are not part of the official standard library. This should be fixed. This is pretty much enforcing people using real precision when inferred. And this got optimized because either it is a compiler builtin or this code is inlined and end up being optimized.

People that do auto lf = log2(f); think that lf is a float, but it isn't. To enforce this when using type inference the users need to do their own functions.

https://godbolt.org/z/T1h3Y9oqc

(...) are you talking about the pragma(inline) to prevent the costly Phobos trampoline? Unlike DMD, the other compilers have to deal with real likely not being X87 (incl. Windows targets and all non-x86 targets). There are other DMD beauties like ldc-developers/phobos@b3efcea/std/math/exponential.d#L3123, implying less precision than most other x87 implementations.

Feel free to try to improve the existing implementations; but note that testing them is a PITA for non-LDC devs like you, as our Phobos fork doesn't have CI, and needs to be tested by opening an LDC PR and advancing the Phobos submodule accordingly.

Can we do better? I would love to see more integration with other compilers in the official Phobos test suite, at least for LDC. Most people use LDC for releases and I've seen code breakage for LDC only when compiling with release mode. I think this will not only increase stability but also make D more production-ready, although I understand that the official team doesn't want to mix those implementations, which is unfortunate.

@kinke
Copy link
Contributor

kinke commented Aug 20, 2021

I just realized that those overloads are not part of the official standard library. This should be fixed.

Ah that's what you meant - yes, this is exactly why they are commented out, just waiting for upstream to finally get them. There are multiple issues about this, e.g., https://issues.dlang.org/show_bug.cgi?id=18559 and https://issues.dlang.org/show_bug.cgi?id=19346. I've had a go at a bunch of them a while back (especially those for which there's no LLVM intrinsic), but there sadly hasn't been much movement on that front since then: #6272

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

I just realized that those overloads are not part of the official standard library. This should be fixed.

Ah that's what you meant - yes, this is exactly why they are commented out, just waiting for upstream to finally get them. There are multiple issues about this, e.g., issues.dlang.org/show_bug.cgi?id=18559 and issues.dlang.org/show_bug.cgi?id=19346. I've had a go at a bunch of them a while back (especially those for which there's no LLVM intrinsic), but there sadly hasn't been much movement on that front since then: #6272

I can write some overrides for these situations then, but this may introduce a breaking change if there are people relying on the previous overloads.

@ljmf00
Copy link
Member Author

ljmf00 commented Aug 20, 2021

Anyway, is this good to go? @maxhaton If yes, can you elaborate on the comment suggestion?

return real.nan;

Clear1: asm pure nothrow @nogc{
fld rnan ; // return rnan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this supposed to work? You're pushing rnan, then popping it immediately again via fstp in Clear1 before returning, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, forgot to jump to Lret label. Nice catch, even though, it doesn't fail. Fixed

@kinke
Copy link
Contributor

kinke commented Aug 20, 2021

I can write some overrides for these situations then,

That'd be great.

but this may introduce a breaking change if there are people relying on the previous overloads.

Breaking changes for a new compiler/druntime/Phobos version wouldn't be something new, and good ones like these are warmly welcome, especially if the old behavior can be restored via an explicit cast to real.

@ljmf00 ljmf00 force-pushed the fix-asm-tan-impl branch 2 times, most recently from 260d94a to 8c944cf Compare August 21, 2021 16:24
@ljmf00
Copy link
Member Author

ljmf00 commented Aug 21, 2021

I can write some overrides for these situations then,

That'd be great.

but this may introduce a breaking change if there are people relying on the previous overloads.

Breaking changes for a new compiler/druntime/Phobos version wouldn't be something new, and good ones like these are warmly welcome, especially if the old behavior can be restored via an explicit cast to real.

Sure, I'll add this on a separate PR as it is unrelated.

@ljmf00 ljmf00 requested a review from maxhaton August 21, 2021 16:26
Somehow, separating asm blocks in small pieces produces additional
`mov`s and other instructions on LDC that breaks the tanAsm logic and
therefore returns wrong values.

Signed-off-by: Luís Ferreira <contact@lsferreira.net>
@ljmf00
Copy link
Member Author

ljmf00 commented Aug 21, 2021

@maxhaton I added a comment describing why I added the additional immutable variable. Is this good to go now? Let me know if anything is still missing.

@ljmf00 ljmf00 requested a review from kinke August 21, 2021 16:31
Copy link
Member

@maxhaton maxhaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but if any one has objections please speak up

@maxhaton maxhaton added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Aug 21, 2021
@RazvanN7 RazvanN7 removed the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Aug 24, 2021
@RazvanN7 RazvanN7 merged commit 27f37ac into dlang:master Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants